RE: Unicode 3.2 BETA

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Dec 12 2001 - 13:50:51 EST


Marco asked:

> > The tables of
> > standardized variants are listed in the Unicode Character
> > Database in the
> > file StandardizedVariants.html
>
> Is this file publicly available? Where?

This one will be arriving shortly, along with SpecialCasing.txt,
which is still missing in action, and the various .html documentation
files for the Unicode Character Database.

> > Combining Grapheme Joiner (U+034F)
> > This new character is used to request that the two adjacent
> > characters are
> > not to be in separate grapheme clusters. (Note: the term
> > "grapheme" has been
> > replaced by "grapheme cluster" in the Unicode Standard.)
>
> Which scripts will use this?

Initially probably just Latin -- it is a special-case hack for
distinguishing things like digraphs from otherwise identical
sequences of two characters, when needed for processing.

> Will CGJ replace ZWJ for Indic scripts?

No.

>
> > - Grapheme_Base, Grapheme_Extend, Grapheme_Link
> > - IDS_Binary_Operator, IDS_Trinary_Operator, Radical,
> > Unified_Ideograph
> > - Default_Ignorable_Code_Point
> > - Deprecated
>
> Are the file(s) containing these new properties publicly available? Where?

See PropList.txt and DerivedCoreProperties.txt, which contain these.

This will be easier once the documentation files for the Unicode
Character Database have been updated.

>
> > Most notable is a further tightening of the definition of UTF-8, to
> > eliminate irregular UTF-8.
>
> Is a public draft available? Where?

Not yet. The editorial committee is working on UAX #23, Unicode 3.2,
in which all will be revealed. ;-)

--Ken

>
> Thanks in advance.
> _ Marco
>
>



This archive was generated by hypermail 2.1.2 : Wed Dec 12 2001 - 13:38:27 EST