Re: Normalization Form KC for Linux

From: Martin J. Duerst (duerst@w3.org)
Date: Thu Aug 19 1999 - 02:06:59 EDT


At 13:22 99/08/18 -0700, Francois Yergeau wrote:
> タ 12:20 1999-08-18 -0700, Kenneth Whistler a 馗rit :
> >Markus Kuhn wrote:
> >> encoding text in Unicode under Linux should be Normalization Form KC as
> >> defined in Unicode Technical Report #15
> >> <http://www.unicode.org/unicode/reports/tr15/>.
> >
> >My only concern is that Normalization Form C (rather than KC) might
> >be more appropriate.
>
> Form C is in fact the form chosen by the W3C "Character Model for the WWW"
> (http://www.w3.org/TR/WD-charmod). This is not final (still a WD - working
> draft) but is likely to stick, IMHO. I think that Linux should have good
> reasons to choose KC instead of C.

Thanks to Francois for bringing this up. The W3C "Character Model for
the WWW" also says that compatibility characters (i.e. things that are
removed when applying compatibility decomposition) are discouraged.
Something similar is most probably appropriate for Linux. There are
definitely various ways and places to 'implement' this
discouragement.

In charlint [http://www.w3.org/International/charlint/Overview.html],
a perl program to do Normalization and other checks, my plan is to
make KC available, but also to allow various variants, e.g. only
normalize away superscripts/subscripts, and so on.

Regards, Martin.

#-#-# Martin J. Du"rst, World Wide Web Consortium
#-#-# mailto:duerst@w3.org http://www.w3.org



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT