Re: Normalization in panlingual application

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Sep 20 2007 - 11:41:36 CDT

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"

Previous message: Ed Trager: "Re: Normalization in panlingual application"
In reply to: John D. Burger: "Re: Normalization in panlingual application"
Next in thread: John D. Burger: "Re: Normalization in panlingual application"
Reply: John D. Burger: "Re: Normalization in panlingual application"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 9/20/2007 6:02 AM, John D. Burger wrote:
>>> It should at best have been just a non-mandatory recommendation,
>>> allowing tailoring (even IDN no longer refers to it directly, and
>>> needed to redefine its own foldings).
>>
>> That's because IDN is morphing beyond simple identifiers as
>> traditionally understood for programming languages and the like. IDN
>> is attempting to be closer to ordinary language, and that's why the
>> limitations of NFKD/NFKC become apparent.
>
> I'm not that familiar with IDN - do the foldings specified by IDN
> constitute a useful "sweet spot" for normalization/folding, somewhere
> in between NFD and NFKD? That is, might there be broad classes of
> applications (such as the original poster's) for which "IDN
> normalization" is a good solution? I understand that any particular
> application would ideally pick and choose from the possibilities in
> UTR 30, but it'd be great if I could say "start with IDN" when people
> ask me about these issues.
IDN still operates on a restricted domain of characters, many characters
that are part of ordinary text are disallowed from the get-go (I haven't
checked where that subset is at recently, but that's the general idea).
At the minimum, the transformations that are designed into IDN would
need to be modified or extended to handle such characters. Because of
that alone, the normalization and folding aspect of IDN is unlikely to
be suitable for general text. There are likely additional issues.

If you suggest that any scheme in which you can't represent the word
"can't" is suitable for the class of applications that the original
poster represents, then I fail to follow you.

Also, in the case of foldings, there's not necessarily a single
continuum. Yes, if you look at UTS#30 it does point out that the
compatibility mappings can be separated into several types of foldings -
but there are other foldings that cut across the spectrum in different
ways, for example case folding. Finally, compatibility mappings are
immutable and assigned rather mechanically to new characters added to
the standard (mostly based on analogy with existing, similar
characters). However, a well defined folding may exclude or include a
slightly different set of characters, or the folding may act on a
string, not an isolated character. Therefore foldings are not like
little lego blocks that you add one by one until you get from NFD to NFKD.

A./

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Previous message: Ed Trager: "Re: Normalization in panlingual application"
In reply to: John D. Burger: "Re: Normalization in panlingual application"
Next in thread: John D. Burger: "Re: Normalization in panlingual application"
Reply: John D. Burger: "Re: Normalization in panlingual application"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 20 2007 - 11:45:16 CDT