[Sorry -- hit "Send" again too soon]
It is either one code point (lenient parser) or an error (strict parser). It
is never two.
I put samples on:
http://www.macchiato.com/utc/samples_of_utf8.htm
Mark
----- Original Message -----
From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
To: <unicode@unicode.org>
Cc: "'Mark Davis'" <mark@macchiato.com>
Sent: Tuesday, June 05, 2001 05:03
Subject: RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)
> Mark Davis wrote:
> > - I am well aware that one can accept 6-byte supplementary
> > characters on
> > input in UTF-8. (Did you really think I wasn't?)
>
> (O, no, I know you knew!)
>
> But how should this 6-byte sequence be interpreted by a standard UTF-8
> decoder? Does it become one or two code points?
>
> _ Marco
>
>
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT