Re: IJ joint in spaced lettering

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 09 2006 - 20:50:31 CST

Next message: Doug Ewell: "Re: IJ joint in spaced lettering"

Previous message: Philippe Verdy: "Re: IJ joint in spaced lettering"
Maybe in reply to: Anto'nio Martins-Tuva'lkin: "IJ joint in spaced lettering"
Next in thread: Doug Ewell: "Re: IJ joint in spaced lettering"
Reply: Doug Ewell: "Re: IJ joint in spaced lettering"
Reply: Philippe Verdy: "Re: IJ joint in spaced lettering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > <0069, 006A> --> 103C.1054.0020.0020.0002.0002
> > <0133> --> 103C.1054.0020.0020.0004.0004
> > <0049, 004A> --> 103C.1054.0020.0020.0008.0008
> > <0132> --> 103C.1054.0020.0020.000A.000A
> > ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^
> > primary secondary tertiary
>

Philippe asked:

> Should n't it be instead (leading zeroes suppressed only for clarity, avoiding
line breaking in emails) ?:
>
> <0069, 006A>
> --> 103C.1054.0.20.20.0.2.2.0.69.6A
> <0133>
> --> 103C.1054.0.20.20.0.4.4.0.133
> <0049, 004A>
> --> 103C.1054.0.20.20.0.8.8.0.49.4A
> <0132>
> --> 103C.1054.0.20.20.0.A.A.0.132
>
> (note the addition of .0. to separate collation levels, to allow
> binary sort order, and the addition of the trailing collation
> level for the default codepoint ordering with unlimited collation keys)

Not necessary. The UCA generally assumes a maximum level of 3, to
simplify discussion, and because that is usually all that is
needed. The 4th level values in the DUCET table are just there
to make further distinctions if people need them in certain cases.

Furthermore, because the DUCET values are constructed with all
primary weights > all secondary weights > all tertiary weights,
I make use of the implementation technique discussed in 6.1.1,
Eliminating Level Separators. Level separators aren't needed in
constructing examples from DUCET, if no table tailoring has been
applied and no other compression techniques are used.

Note that the constructed keys I posted will sort in the exact
same relative order as those you posted. The 4th level differences
are irrelevant and are swamped by the tertiary differences.

> Another related question: Why isn't there a standard 16-bit UTF
> that preserves the binary ordering of codepoints?
> (I mean for example UTF-16 modified simply by moving all
> code units or code points in E000..FFFF down to D800..F7FF
> and moving surrogate code units in D800..DFFF up to F800..FFFF).

Huh? Because it would confuse the hell out of everybody and lead
to problems, just like any other putative fixes by proliferation
of UTF's.

Sorting UTF-16 in binary order is easy. See "UTF-16 in UTF-8 Order",
p. 136 of TUS 4.0.

--Ken

Next message: Doug Ewell: "Re: IJ joint in spaced lettering"
Previous message: Philippe Verdy: "Re: IJ joint in spaced lettering"
Maybe in reply to: Anto'nio Martins-Tuva'lkin: "IJ joint in spaced lettering"
Next in thread: Doug Ewell: "Re: IJ joint in spaced lettering"
Reply: Doug Ewell: "Re: IJ joint in spaced lettering"
Reply: Philippe Verdy: "Re: IJ joint in spaced lettering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 09 2006 - 20:51:59 CST