Re: UTF-8 validation rules

From: Misha.Wolf@reuters.com
Date: Mon Sep 10 2001 - 14:50:29 EDT

Previous message: Ayers, Mike: "RE: The trouble with text-sorting algorithms"
Maybe in reply to: Carl W. Brown: "UTF-8 validation rules"
Next in thread: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Carl,

You seem to be using the word "character" in some places where
you (probably) mean "byte", eg:

> All UTF-8 characters must be followed by the proper number of valid
> continuation characters, if any.

Misha

On 10/09/2001 18:21:48 Carl W. Brown wrote:
> I am checking out my UTF-8 validation rules to see if they are correct.
>
> Check each character to be a valid UTF-8 initial character.
>
> \x00 to \x7f or \xC2 to \xF4
>
> Allow invalid forms such as \xC0 & \xC1 to decode but consider them invalid.
>
> A first byte of \xE0 or \xF0 with a second byte less than \xA0 is also an
> invalid form.
>
> \xED followed by anything >= \xA0 is an encoded surrogate and not a valid
> character.
>
> \xEF\xBF\xBE and \xEF\xBF\xBF are invalid Unicode characters.
>
> Anything greater than \xF4\x80\xBF\xBF is beyond the Unicode range.
>
> All UTF-8 characters must be followed by the proper number of valid
> continuation characters, if any.
>
> Carl
>
>
>

-----------------------------------------------------------------
Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.

Previous message: Ayers, Mike: "RE: The trouble with text-sorting algorithms"
Maybe in reply to: Carl W. Brown: "UTF-8 validation rules"
Next in thread: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Sep 10 2001 - 15:58:48 EDT