Rick wrote:
Oh, sigh... I went to: http://wwwold.dkuug.dk/cultreg/ looking for the Item
115 that Jony Rosenne mentioned. I found it. Those weird mnemonic names
are used in the 1st column, which provides machine-readable tokens for
interpreting the locale tables, and refers back to item 1, the repertoire map
(which is, of course, not complete with respect to 10646)
The "REAL" names are in the last column. For example, here's an entry:
<////> /x5C <U005C> REVERSE SOLIDUS
That entry is certainly readable, and English.
In the context of the syntax in the registry, the last column is
simply a comment. The first column is the symbolic name assigned
to the character.
The weird mnemonics are not
the ONLY mnemonics; but they're used in the actual locale tables like this:
LC_TIME
abday "<s><u><n>";"<m><a'><n>";/
"<t><y'><s>";"<m><i><k>";/
"<h><o'><s>";"<f><r><i'>";/
"<l><e><y>"
which is pedantic,. . .
That's not the issue. The symbolic names are not so bad if you
have the descriptive comment with them. But there are few comments
in the actual locale tables. The example you show here makes these
names look relatively easy, but take a look elsewhere in a locale:
<0.> IGNORE;IGNORE;IGNORE;<0.>
<02> IGNORE;IGNORE;IGNORE;<02>
<-T> IGNORE;IGNORE;IGNORE;<-T>
<.P> IGNORE;IGNORE;IGNORE;<.P>
<:3> IGNORE;IGNORE;IGNORE;<:3>
<Eh> IGNORE;IGNORE;IGNORE;<Eh>
<<7> IGNORE;IGNORE;IGNORE;<<7>
. . .
<H'> <H'>;<H'>;IGNORE;IGNORE
<aM> <aM>;<aM>;IGNORE;IGNORE
<aM.> <aM>;<aM.>;IGNORE;IGNORE
<aH> <H'>;<aH>;IGNORE;IGNORE
<aH.> <H'>;<aH.>;IGNORE;IGNORE
<wH> <H'>;<wH>;IGNORE;IGNORE
What are these? Keld probably can say immediately, because he
is the one who created these names, but I'll bet virtually no
one else can decipher these without the original charmap.
. . .
What Sandra said might be true, but it's inconsistently applied. Yes, some
of the mnemonics are weird strings, but in other places they're listed as
words. For example, in the Hebrew listing (115) that Jony mentioned we see
THREE entries for the backslash:
<////> /x5C <U005C> REVERSE SOLIDUS
<backslash> /x5C <U005C> REVERSE SOLIDUS
<reverse-solidus> /x5C <U005C> REVERSE SOLIDUS
Why on earth is are there three entries for the same thing?
Because POSIX mandates the names <backslash> and <reverse-solidus>.
Where there are POSIX-mandated names, Keld includes them, but only
as duplicates to his names. The POSIX names never get used in
localedefs. I don't object to using names other POSIX'; I just
object to names that are difficult for humans to read.
Two things make me feel sad about all this...
(1) That people are still doing laborious, wasteful things like this
instead of expressing everything simply in terms of Unicode plain-text
and/or ISO10646 codepoints;
Since not everyone has the ability to process Unicode plain-text,
I don't think it's reasonable to expect that everything be expressed
that way. Using the Unicode/10646 codepoints is preferable, IMO, to
these symbolic names because it's easy to map to the Unicode std.
However, symbolic names were created to name characters according
to what they are as opposed to how they're encoded, and I still
have the dying hope that they can be used that way *and* be
human-readable.
(2) That anyone would "object" to the actual machine-readable mnemonics
as if they were a cultural affront.
I'm not sure what you're saying here.
Anyway... Does this cultural registry MEAN anything?. . .
I don't know. Many of these charmaps and locales have been
available since 1993 or 1994. Who is using them?
-- Sandra
-----------------------
Sandra Martin O'Donnell
odonnell@zk3.dec.com
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT