Re: Cultural registry as international standard

From: Rick McGowan (rmcgowan@apple.com)
Date: Thu Sep 17 1998 - 17:34:21 EDT


Jony said --

> I object to the strange pseudo names attached to the Hebrew characters and
> some other characters, for example in item 115 "ISO-8859-8 ISO-IR-138
> ISO_8859-8:1988 ISO_8859-8 HEBREW".

Those things are just supposed to be machine readable, not meaningful. I'd
rather any meaningful names than the mnemonics -- which are not only
gobble-de-gook, but are laboriously enclosed in myriad layers of "<" and
">"... but they're just supposed to be input to some machine. Who cares?

Sandra said --

> Keld and others felt the symbolic names should not be English-based
> because that gave an unfair advantage to English speakers.

Oh, sigh... I went to: http://wwwold.dkuug.dk/cultreg/ looking for the Item
115 that Jony Rosenne mentioned. I found it. Those weird mnemonic names
are used in the 1st column, which provides machine-readable tokens for
interpreting the locale tables, and refers back to item 1, the repertoire map
(which is, of course, not complete with respect to 10646)

The "REAL" names are in the last column. For example, here's an entry:

        <////> /x5C <U005C> REVERSE SOLIDUS

That entry is certainly readable, and English. The weird mnemonics are not
the ONLY mnemonics; but they're used in the actual locale tables like this:

        LC_TIME
        abday "<s><u><n>";"<m><a'><n>";/
                "<t><y'><s>";"<m><i><k>";/
                "<h><o'><s>";"<f><r><i'>";/
                "<l><e><y>"

which is pedantic, but if you only have ASCII to express strings of
characters from a richer set, that's the kind of tiresome act you have to
play. They could have played the game where the repertoire map contains
"mnemonic tokens" that are even more arbitrary, like
                <938670> /x5C <U005C> REVERSE SOLIDUS

and nobody would probably have objected.

What Sandra said might be true, but it's inconsistently applied. Yes, some
of the mnemonics are weird strings, but in other places they're listed as
words. For example, in the Hebrew listing (115) that Jony mentioned we see
THREE entries for the backslash:

        <////> /x5C <U005C> REVERSE SOLIDUS
        <backslash> /x5C <U005C> REVERSE SOLIDUS
        <reverse-solidus> /x5C <U005C> REVERSE SOLIDUS

Why on earth is are there three entries for the same thing?

For my purposes, the specifications would be useless as-is and would have to
be translated into Unicode strings.

Two things make me feel sad about all this...

        (1) That people are still doing laborious, wasteful things like this
        instead of expressing everything simply in terms of Unicode plain-text
        and/or ISO10646 codepoints;

        (2) That anyone would "object" to the actual machine-readable mnemonics
        as if they were a cultural affront.

Anyway... Does this cultural registry MEAN anything? Will it be USEFUL to
anyone? It apparently has not been touched since 1997-12-20. It contains
nothing but
        (A) char maps for a large number of standards (which is a great
                feat, I suppose and very useful for interconversion);
        (B) three locales; and
        (C) four narrative locale specs.

        Rick



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT