Re: Support for non-BMP characters from Szelp, A. Sz. on 2012-04-25 (Unicode Mail List Archive)

From: Szelp, A. Sz. <a.sz.szelp_at_gmail.com>
Date: Wed, 25 Apr 2012 12:16:50 +0200

I'm really not a technical expert, but what you write rather sounds to me
as if Javascripts UCS-2 implementation were broken...
Thanks for the linked document.

On Wed, Apr 25, 2012 at 11:41, Marc Durdin <marc.durdin_at_tavultesoft.com>wrote:

> Yes, but this means that regexes with SMP don’t work (e.g. [𝒜-𝒵]),
> character counts returns code units, etc. So you have to reimplement
> string.length, string.charCodeAt, etc, if you don’t want to deal with
> surrogate pairs (I reckon you’ve got better things to be spending your time
> on).****
>
> ** **
>
> http://dheeb.files.wordpress.com/2011/07/gbu.pdf “Unicode Support
> Shootout - The Good, the Bad & the (mostly) Ugly” by Tom Christiansen has
> a great summary of some of the issues with relying on JavaScript’s internal
> string manipulation (unfortunately can’t find a better working link at
> present – the official training.perl.com site seems to be down).
> Actually, that presentation is a fantastic place to start for understanding
> many of the limitations of various programming languages’ support for
> Unicode – if you haven’t read it, I’d urge you to go read it now.****
>
> ** **
>
> Marc****
>
> ** **
>
> *From:* Szelp, A. Sz. [mailto:a.sz.szelp_at_gmail.com]
> *Sent:* Wednesday, 25 April 2012 7:28 PM
> *To:* Marc Durdin
> *Cc:* David Starner; Unicode Mailing List
> *Subject:* Re: Support for non-BMP characters****
>
> ** **
>
> Shouldn't it be technically possible to store Supplementary Plane
> characters in UTF-16 / UCS-2 as well? Isn't that what Surrogate Pairs are
> for?****
>
> ** **
>
> Sz ****
>
> On Wed, Apr 25, 2012 at 11:09, Marc Durdin <marc.durdin_at_tavultesoft.com>
> wrote:****
>
> Probably the most egregious example I know of is JavaScript. As far as I
> know, JavaScript still only groks UCS-2. I'd love to be wrong.
>
> Marc****
>
>
> -----Original Message-----
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On
> Behalf Of David Starner
> Sent: Wednesday, 25 April 2012 6:32 PM
> To: Unicode Mailing List
> Subject: Support for non-BMP characters
>
> It's been ten years since the first non-BMP characters were encoded.
> How are they working in your neck of the woods? There's a lot of places
> where they're working just fine, but I was facing MySQL's support. It has
> had support for UCS-2 and UTF-8 limited to the BMP for a long time; now in
> MySQL 5.5 there's utf16, utf32 and utf8mb4. (MySQL
> 5.1 and 5.5 are the current stable releases.) But there's enough warnings
> about incompatibilities with utf8mb4 to make me pause before switching my
> private database to it, and I think the net will see MySQL databases with
> utf8 instead of utf8mb4 as long as MySQL exists, unless they decide to push
> people over to it.
>
> (Ada's an issue too, though not one most people will have to deal with.
> While Ada 2005 added a UTF-32 string type, it left the UCS-2 string type as
> is. Again, I suspect a lot of nominally Unicode Ada programs are going to
> BMP-only. Of course, UTF-8 as an ASCII superset is used, stuffed into
> strings labeled Latin-1; it's technically not conformant with the Ada
> standard but it works so long as you don't need much string processing.)
>
> In any case, is the use of non-BMP characters still problematic in your
> corner of the computing world or is everything looking fine from where you
> are?
>
> --
> Kie ekzistas vivo, ekzistas espero.
>
>
>
> ****
>
> ** **
>
Received on Wed Apr 25 2012 - 05:21:25 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 25 2012 - 05:21:27 CDT