Re: IJ joint in spaced lettering

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 09 2006 - 20:50:31 CST

  • Next message: Doug Ewell: "Re: IJ joint in spaced lettering"

    > > <0069, 006A> --> 103C.1054.0020.0020.0002.0002
    > > <0133> --> 103C.1054.0020.0020.0004.0004
    > > <0049, 004A> --> 103C.1054.0020.0020.0008.0008
    > > <0132> --> 103C.1054.0020.0020.000A.000A
    > > ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^
    > > primary secondary tertiary
    >

    Philippe asked:

    > Should n't it be instead (leading zeroes suppressed only for clarity, avoiding
    line breaking in emails) ?:
    >
    > <0069, 006A>
    > --> 103C.1054.0.20.20.0.2.2.0.69.6A
    > <0133>
    > --> 103C.1054.0.20.20.0.4.4.0.133
    > <0049, 004A>
    > --> 103C.1054.0.20.20.0.8.8.0.49.4A
    > <0132>
    > --> 103C.1054.0.20.20.0.A.A.0.132
    >
    > (note the addition of .0. to separate collation levels, to allow
    > binary sort order, and the addition of the trailing collation
    > level for the default codepoint ordering with unlimited collation keys)

    Not necessary. The UCA generally assumes a maximum level of 3, to
    simplify discussion, and because that is usually all that is
    needed. The 4th level values in the DUCET table are just there
    to make further distinctions if people need them in certain cases.

    Furthermore, because the DUCET values are constructed with all
    primary weights > all secondary weights > all tertiary weights,
    I make use of the implementation technique discussed in 6.1.1,
    Eliminating Level Separators. Level separators aren't needed in
    constructing examples from DUCET, if no table tailoring has been
    applied and no other compression techniques are used.

    Note that the constructed keys I posted will sort in the exact
    same relative order as those you posted. The 4th level differences
    are irrelevant and are swamped by the tertiary differences.

    > Another related question: Why isn't there a standard 16-bit UTF
    > that preserves the binary ordering of codepoints?
    > (I mean for example UTF-16 modified simply by moving all
    > code units or code points in E000..FFFF down to D800..F7FF
    > and moving surrogate code units in D800..DFFF up to F800..FFFF).

    Huh? Because it would confuse the hell out of everybody and lead
    to problems, just like any other putative fixes by proliferation
    of UTF's.

    Sorting UTF-16 in binary order is easy. See "UTF-16 in UTF-8 Order",
    p. 136 of TUS 4.0.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Jan 09 2006 - 20:51:59 CST