From: Phillips, Addison (addison@lab126.com)
Date: Mon May 23 2011 - 10:17:22 CDT
>
> > So please use NFD for internal processing if you think that helps you, but
> please use NFC for all cases where it may be seen by other programs.
>
>
> You imply that some programs have problems with decomposed characters –
> exactly my point, they must not have.
NFC does not remove all combining marks from a string; it only removes those that have pre-composed forms. Martin's point is that you generally should not emit NFD "into the wild" (since naïve programs expect certain sequences to be pre-composed), but this is not the same thing as saying that there should be no combining marks in a string. Many scripts, in fact, cannot be written without combining marks (cf. many South Asian scripts).
>
> >> It would be cool if there was an ASCII-compatible encoding with variable
> length like UTF-8 that supported only NFD (…) and was optimized for a small
> storage footprint,
> >
> > We don't need any more character encodings.
>
> I phrased that badly. I’m fine with the existing UTFs, I just think it would have
> been cool for usability if the most prevalent of them, i.e. UTF-8, was an
> encoding like that, because font and software developers would think
> differently about characters then. Here and now many still consider
> precomposed ones the norm and combining diacritics an exotic oddity.
For certain scripts, such as the Latin script, combining marks are generally rare and "an oddity". If you work primarily in one of these scripts, you may not encounter combining marks often or at all. Forcing developers to deal with combining marks ("because it is good for them") probably would have made support for the encoding rare. Which would have been bad.
Addison
Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)
Internationalization is not a feature.
It is an architecture.
This archive was generated by hypermail 2.1.5 : Mon May 23 2011 - 10:22:24 CDT