From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Apr 11 2008 - 07:18:02 CDT
Hello Henrik,
you have written:
> When writing a character name recognition algorithm, I would like to
> let the user be as concise as possible, yet without violating Unicode
> rules, and without being in potential conflict with upcoming versions
> of Unicode.
Here is an idea for a different approach that may be used
independently from, or in addition to, yours.
You could allow your users to provide a context for
the search. If your algorithm knows that the user is
looking for, e. g., a Khmer character, you could
- consider only characters used in Khmer;
- supply the common name-constituents “KHMER LETTER”,
“KHMER SYMBOL”, “KHMER INDEPENDET VOWEL”, etc.,
and try all of them, concatenated with the user’s input.
If the algorithm is to be used, interactively, you could
conduct a pattern-maching amongst all eligible (in the
context given, cf. supra) character names, and let the user
choose amongst the hits.
If, however, your algorithm is meant as an API (to be used
from scripts, or programs), this pattern-matching approach is
less suitable, as a pattern that selects a unique name,
today, may become ambiguous, with a future version of the
standard.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Fri Apr 11 2008 - 07:21:58 CDT