[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8838(accepted data)

Opened 3 years ago

Last modified 3 months ago

Collation details for the Māori Language (mi)

Reported by: graham_oliver@… Owned by: markus
Component: collation Data Locale: mi
Phase: rc Review:
Weeks: Data Xpath:


I have been researching this for a while now and I have

a) Produced an academic poster summarising the development of collation in the Maori language since it was first written down https://www.academia.edu/8917175/Orthography-collation-go
b) Written code and test cases in Python to reproduce the sorting scheme used by the Māori Language Commission.
c) Corresponded with the I.T. person that implemented the sorting scheme for the Māori Language Commission.

What follows are my best efforts at defining the minimal rules (with explanation) as described in http://cldr.unicode.org/index/cldr-spec/collation-guidelines

At Level 1
There are 2 digraphs 'ng' and 'wh'
n < ng
w < wh

At Level 2
The macronised vowels are sorted *after* the non-macronised vowels
My understanding is that this is how DUCET does it so no rule is necessary

At Level 3
UPPER CASE sort before lower case
Ā <<< ā
Ē <<< ē
Ī <<< ī
Ō <<< ō
Ū <<< ū
NG <<< Ng <<< ng
WH <<< Wh <<< wh

Punctuation (basically dashes and spaces) are removed before sorting

I have included a stripped down version of the code I have used to test the above.

There is no English reference to point to. The best I could do is to scan some pages from the normative reference dictionary (He Pātaka Kupu). All in Maori however.

Let me know if you need any more information

Graham Oliver

btw - Thanks for a great project!


maori-collation-tests-for-cldr.py (2.8 KB) - added by graham_oliver@… 3 years ago.

Change History

Changed 3 years ago by graham_oliver@…

comment:1 Changed 3 years ago by emmons

  • Status changed from new to accepted
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 29
  • Owner changed from anybody to markus
  • Type changed from unknown to data

comment:2 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

comment:3 Changed 20 months ago by graham_oliver@…

Hi there
Is this going to be included in release 30?

comment:4 follow-up: ↓ 5 Changed 20 months ago by markus

  • Milestone changed from upcoming to 31

No, sorry, CLDR 30 is done, this is not in there.

comment:5 in reply to: ↑ 4 Changed 19 months ago by graham_oliver@…

Replying to markus:

No, sorry, CLDR 30 is done, this is not in there.

ok thanks, hopefully 31 then

comment:6 Changed 15 months ago by markus

  • Milestone changed from 31 to 32

comment:7 Changed 8 months ago by markus

  • Keywords punt32 added

comment:8 Changed 8 months ago by markus

  • Milestone changed from 32 to 33

comment:9 Changed 3 months ago by markus

  • Keywords punt33 added
  • Milestone changed from 33 to 34

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.