Re: Cultural registry as international standard

From: odonnell@zk3.dec.com
Date: Tue Sep 22 1998 - 14:25:33 EDT


   Rick wrote:
   Oh, sigh... I went to: http://wwwold.dkuug.dk/cultreg/ looking for the Item
   115 that Jony Rosenne mentioned. I found it. Those weird mnemonic names
   are used in the 1st column, which provides machine-readable tokens for
   interpreting the locale tables, and refers back to item 1, the repertoire map
   (which is, of course, not complete with respect to 10646)
   
   The "REAL" names are in the last column. For example, here's an entry:
   
           <////> /x5C <U005C> REVERSE SOLIDUS
   
   That entry is certainly readable, and English.

In the context of the syntax in the registry, the last column is
simply a comment. The first column is the symbolic name assigned
to the character.

   The weird mnemonics are not
   the ONLY mnemonics; but they're used in the actual locale tables like this:

        LC_TIME
        abday "<s><u><n>";"<m><a'><n>";/
                "<t><y'><s>";"<m><i><k>";/
                "<h><o'><s>";"<f><r><i'>";/
                "<l><e><y>"

   which is pedantic,. . .

That's not the issue. The symbolic names are not so bad if you
have the descriptive comment with them. But there are few comments
in the actual locale tables. The example you show here makes these
names look relatively easy, but take a look elsewhere in a locale:

<0.> IGNORE;IGNORE;IGNORE;<0.>
<02> IGNORE;IGNORE;IGNORE;<02>
<-T> IGNORE;IGNORE;IGNORE;<-T>
<.P> IGNORE;IGNORE;IGNORE;<.P>
<:3> IGNORE;IGNORE;IGNORE;<:3>
<Eh> IGNORE;IGNORE;IGNORE;<Eh>
<<7> IGNORE;IGNORE;IGNORE;<<7>
. . .
<H'> <H'>;<H'>;IGNORE;IGNORE
<aM> <aM>;<aM>;IGNORE;IGNORE
<aM.> <aM>;<aM.>;IGNORE;IGNORE
<aH> <H'>;<aH>;IGNORE;IGNORE
<aH.> <H'>;<aH.>;IGNORE;IGNORE
<wH> <H'>;<wH>;IGNORE;IGNORE

What are these? Keld probably can say immediately, because he
is the one who created these names, but I'll bet virtually no
one else can decipher these without the original charmap.

   . . .
   What Sandra said might be true, but it's inconsistently applied. Yes, some
   of the mnemonics are weird strings, but in other places they're listed as
   words. For example, in the Hebrew listing (115) that Jony mentioned we see
   THREE entries for the backslash:
   
           <////> /x5C <U005C> REVERSE SOLIDUS
           <backslash> /x5C <U005C> REVERSE SOLIDUS
           <reverse-solidus> /x5C <U005C> REVERSE SOLIDUS
   
   Why on earth is are there three entries for the same thing?

Because POSIX mandates the names <backslash> and <reverse-solidus>.
Where there are POSIX-mandated names, Keld includes them, but only
as duplicates to his names. The POSIX names never get used in
localedefs. I don't object to using names other POSIX'; I just
object to names that are difficult for humans to read.

    Two things make me feel sad about all this...

        (1) That people are still doing laborious, wasteful things like this
        instead of expressing everything simply in terms of Unicode plain-text
        and/or ISO10646 codepoints;

Since not everyone has the ability to process Unicode plain-text,
I don't think it's reasonable to expect that everything be expressed
that way. Using the Unicode/10646 codepoints is preferable, IMO, to
these symbolic names because it's easy to map to the Unicode std.
However, symbolic names were created to name characters according
to what they are as opposed to how they're encoded, and I still
have the dying hope that they can be used that way *and* be
human-readable.

        (2) That anyone would "object" to the actual machine-readable mnemonics
        as if they were a cultural affront.

I'm not sure what you're saying here.

    Anyway... Does this cultural registry MEAN anything?. . .

I don't know. Many of these charmaps and locales have been
available since 1993 or 1994. Who is using them?

                -- Sandra
-----------------------
Sandra Martin O'Donnell
odonnell@zk3.dec.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT