From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jul 09 2004 - 21:34:29 CDT
At 08:33 PM 7/9/2004, John Cowan wrote:
> > I have just reviewed this list and found it odd that Hebrew presentation
> > forms are included but Arabic ones are not.
>
>The specification actually called only for Latin, Greek, and Cyrillic;
>I added Hebrew pour la lagniappe. If someone wants to add Arabic, I
>encourage them to do so.
>
> > the Hebrew presentation forms but also most of the precomposed
> > characters are redundant in this list.
>
>True; however, the current list indicates the scope of what actually
>happens, even if it is overlong.
I have taken the file from the server today and massaged it to be in a form
suitable for inclusion in the next draft of TR#30, which will be issued in
time for the UTC to review it in August.
Once the review issue opens for this draft, please comment on the review
form, so that the UTC has formal input to evaluate.
My understanding of the folding would be that it would be more agressive in
diacritic folding than some languages, so that it is useful in cross
language searching. For example, it should allow English users to search
for words with accented characters in them by supplying the equivalent word
spelled in base letters only.
'i' has a dot, but doesn't have a base letter that's more 'basic' than
itself, since dotless-i, while theoretically there, is more specialized and
not universally accessible from input devices.
o-slash, can be analyzed as o and slash, even though that's not done
canonically in Unicode. Allowing users outside Scandinavia to perform
fuzzy searches for words with this character is useful.
In this view of folding, Language-specific fuzzy searches would be tailored
(usually by being based on collation information, rather than on generic
diacritic folding).
A./
This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 21:36:06 CDT