Re: ISO 10646 compliance and EU law

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 05 2005 - 13:17:02 CST

  • Next message: Kenneth Whistler: "Re: ISO 10646 compliance and EU law"

    > On Wednesday, January 5th, 2005 14:18Z Philipp Reichmuth va escriure:
    >
    > > I wouldn't rule this out entirely. For example, I know one attempt
    > > to implement a Tibetan font where the underlying representation was
    > > Latin (Wylie), and the Tibetan glyphs were generated from the Latin
    > > transliteration using OpenType rules

    I presume Philipp Reichmuth was talking about:

    http://www.nitartha.org/wylieandconverter.html

    >
    > Or I did not understand you, or this has nothing to do with Unicode, much
    > less Unicode/10646 conformance.
    >
    > The Tibetan characters are _never_ encoded using Unicode in this process,
    > are they?
    > Looks like a clear case of nonconformance to me.

    Not at all. It is simply another example of someone using
    a transliteration as a kind of betacode for text, and then
    adapting a font technology to display it.

    If an application clearly states what it is doing, it can
    do this conformantly in Unicode.

    If I state that data is entered and stored as "bsgribs" or
    "g.yon" or "sat+t+wa" or whatever and that I then provide
    an OpenType font with rules that will map those into the
    appropriate display sequences of Tibetan stacks, anybody else who
    wants to share those conventions (or use the application) is
    free to do so.

    The Unicode *conformance* issue there is whether the Latin
    letter "b" used in the Wylie transliteration is correctly
    represented as U+0062, and whether, if using UTF-16, that
    shows up in stored data and strings as a 16-bit code unit,
    0x0062, or if using UTF-8, that shows up in stored data
    and strings as an 8-bit code unit, 0x62, and so on.

    If somebody wants to build conventions on top of that basic
    identity of the character, and use the "b" to represent
    a Tibetan letter or a Semitic letter or a type of atomic
    bond or a chocolate chip cookie recipe, Unicode neither
    knows nor cares -- it is up to the users of such conventions
    to define what they are doing for themselves.

    Sheesh, folks, I don't know why this is so difficult to
    comprehend.

    The Unicode Consortium does not, and never has tried to
    constrain what conventions and higher-level protocols people
    can apply on top of Unicode characters. The standard does
    not and never has tried to legislate how any particular
    piece of textual data *must* be represented.

    At the risk of verging into wacko territory again, if I
    want to represent the Hebrew text of Genesis 1.1 in a Yi
    cipher, I can.

    All the Unicode Standard does is define a bunch of characters
    and rules for how to use them interoperably for *plain text*
    representation. If I get some stream of Unicode Tibetan
    characters identified as Unicode plain text, then I can
    presume to know what the content of that text is, by use
    of the standard to interpret it. If I get some stream of Unicode
    Tibetan characters, but my communicant has indicated to me
    that they are using a protocol for the representation of
    Elvish poetry in Tibetan characters, well, then I know it
    isn't Tibetan plain text, and I can either use the protocol
    to enjoy the Elvish poetry or dump the Tibetan characters
    as uninterpretable.

    Capiche?

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Jan 05 2005 - 13:24:38 CST