CLDR Ticket #8838(accepted data)
Collation details for the Māori Language (mi)
|Reported by:||graham_oliver@…||Owned by:||markus|
I have been researching this for a while now and I have
a) Produced an academic poster summarising the development of collation in the Maori language since it was first written down https://www.academia.edu/8917175/Orthography-collation-go
b) Written code and test cases in Python to reproduce the sorting scheme used by the Māori Language Commission.
c) Corresponded with the I.T. person that implemented the sorting scheme for the Māori Language Commission.
What follows are my best efforts at defining the minimal rules (with explanation) as described in http://cldr.unicode.org/index/cldr-spec/collation-guidelines
At Level 1
There are 2 digraphs 'ng' and 'wh'
n < ng
w < wh
At Level 2
The macronised vowels are sorted *after* the non-macronised vowels
My understanding is that this is how DUCET does it so no rule is necessary
At Level 3
UPPER CASE sort before lower case
Ā <<< ā
Ē <<< ē
Ī <<< ī
Ō <<< ō
Ū <<< ū
NG <<< Ng <<< ng
WH <<< Wh <<< wh
Punctuation (basically dashes and spaces) are removed before sorting
I have included a stripped down version of the code I have used to test the above.
There is no English reference to point to. The best I could do is to scan some pages from the normative reference dictionary (He Pātaka Kupu). All in Maori however.
Let me know if you need any more information
btw - Thanks for a great project!
- Status changed from new to accepted
- Priority changed from assess to medium
- Phase changed from dsub to rc
- Milestone changed from UNSCH to 29
- Owner changed from anybody to markus
- Type changed from unknown to data