UCA question / Produce Collation Element Arrays
Mark Davis ☕️ via CLDR-Users
cldr-users at unicode.org
Sun Dec 3 06:36:57 CST 2017
The algorithm is predicated on any input table being well formed. (
Tibetan is a documented exception in the DUCET, but it also documents how
to fix it.
On Sat, Dec 2, 2017 at 8:52 PM, Richard Wordingham via CLDR-Users <
cldr-users at unicode.org> wrote:
> On Sat, 2 Dec 2017 16:25:30 +0100
> Mark Davis ☕️ via CLDR-Users <cldr-users at unicode.org> wrote:
> > Supposed that you have the following, where S are starters and n are
> > non-starters. | represents the current position.
> > | S1 S2 S3 n1 n2 n3 n4 S4
> > S1 S2 isn't in the CET, so you emit and logically change the input.
> > I'll represent that as:
> > w(S1) | S2 S3 n1 n2 n3 n4 S4
> One subtle nitpick here. One also has to eliminate <S1 S2 S3>, <S1 S2
> S3 n1>, ... and <S1 S2 S3n1 n2 n3 n4 S4> before one can conclude that
> the relevant collating element is <S1>. I do this by recording whether
> each collating element and prefix of a collating element is the prefix
> of a collating element. This sort of tagging is not logically
> necessary, but is practically very useful.
> The simplest example of this issue in the DUCET is <U+0FB2 U+0F71
> U+0F80>. Or is a conformant implementation of the UCA allowed to reject
> DUCET even if one can find a way to specify that it be used? There's
> no explicit concession that a CET has to be well-formed.
> CLDR-Users mailing list
> CLDR-Users at unicode.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CLDR-Users