RE: Surrogate pairs and UTF-8

From: Peter Constable (petercon@microsoft.com)
Date: Mon Jun 26 2006 - 10:31:49 CDT

Next message: Richard Wordingham: "Re: Finnegans Wake, was Re: comment on L2/06-215"

Previous message: Erkki Kolehmainen: "Re: Finnegans Wake, was Re: comment on L2/06-215"
In reply to: Otto Stolz: "Re: Surrogate pairs and UTF-8"
Next in thread: Rick Cameron: "RE: Surrogate pairs and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> From: Otto Stolz [mailto:Otto.Stolz@uni-konstanz.de]

> > UTF-16 Surrogate Pairs are basically doing the same
> > thing that multi-byte sequences in UTF-8 do
> ...
> > They mainly differ only in details.
>
> One essential detail being that UTF-16 surrogates are excluded
> from the valid Unicode codepoints, while UTF-8 "surrogates"
> have binary values that are also valid Unicode codepoints.

I almost added that but held back because it seemed to me that that's
not really a difference in these encoding forms but rather is just a
fact about the coded character set. But then, IIRC UTF-16 is not able to
represent code points U+D800..U+DFFF while UTF-8 is.

Peter Constable

Next message: Richard Wordingham: "Re: Finnegans Wake, was Re: comment on L2/06-215"
Previous message: Erkki Kolehmainen: "Re: Finnegans Wake, was Re: comment on L2/06-215"
In reply to: Otto Stolz: "Re: Surrogate pairs and UTF-8"
Next in thread: Rick Cameron: "RE: Surrogate pairs and UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 26 2006 - 10:48:31 CDT