[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10098(accepted data)

Opened 15 months ago

Last modified 3 months ago

Lao collation is not linguistically correct

Reported by: mark Owned by: markus
Component: collation Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


[Filed on behalf of Richard Wordingham]

I notice a very similar file lo.xml. When did Laos haul up the white
flag and more or less adopt the modern Thai collation order for Lao?

As there has been no answer to this question, I presume the surrender
has not happened. As my ticket submission was rejected as spam, would
someone kindly file a ticket along these lines:

==Lao collation is not linguistically correct==

The file collation/lo.xml contains the reckless falsehood "The root
collation order is valid for this language".

If phonetic Lao syllables were represented by single characters, Lao
collation would be a simple lexicographic order. It is therefore unable
to use anything but primary weights.

A Lao syllable may be considered to be composed of onset + vowel + coda
+ tone; the onset and vowel may be interleaved (as in Thai), and the
tone is represented by a mark following the onset and no later than
immediately after the vowel. There are two basic schemes ordering for
single syllables:

1) <onset-weight><coda-weight><vowel-weight><tone-weight>
2) <onset-weight><vowel-weight><coda-weight><tone-weight>

The first is the one most commonly used; the second is closer to the
CLDR default.

Unlike Thai, the vowel weighting for compound vowel symbols is not
composed from the individual vowels. For example, part of the ordering

ເກະ < ເກ < ໂກະ < ໂກ < ເກາະ

However, the current collation yields
ເກ < ເກະ < ເກາະ < ໂກ < ໂກະ

This ordering is manifestly wrong.

I suggest that the reckless comment be amended to something like, "The
root collation is of some utility in sorting this language; accurate
collation appears to require large tables".

Yours faithfully,

Richard Wordingham.


Change History

comment:1 Changed 9 months ago by mark

  • Owner changed from anybody to markus
  • Status changed from new to accepted
  • Type changed from unknown to data
  • Milestone changed from UNSCH to 32

comment:2 Changed 8 months ago by markus

  • Keywords punt32 added

comment:3 Changed 8 months ago by markus

  • Milestone changed from 32 to 33

comment:4 Changed 3 months ago by mark

  • Component changed from unknown to collation

comment:5 Changed 3 months ago by markus

  • Keywords punt33 added
  • Milestone changed from 33 to 34

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.