[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #7031(closed enhancement: fixed)

Opened 3 years ago

Last modified 3 years ago

Update collation/ml.xml with simplified rule for AU sign and marker

Reported by: cibu Owned by: markus
Component: collation Data Locale: ml
Phase: rc Review: emmons
Weeks: 0.1 Data Xpath:
Xref:

Description

The current rule is:


# Archaic and modern AU-Signs are different only by tertiary.
#
&ോ<ൗ<<<ൌ


That needs to get replaced with:


# Vowel sign AU ( ൌ) and AU length mark ( ൗ) needs to be differ only by secondary.
&\u0D4C<<\u0D57


Reasoning:

  1. The order among these two signs are not important. The only requirement is to make them differ only by secondary.
  2. Whether the difference is secondary or tertiary is debatable. In user's mind this difference is more or less parallel to Latin long-s and short-s difference. They differ by secondary. Since, Vowel sign has an additional separate symbol, it could be viewed as a different combining mark. Combining marks differ by secondary key. This tilts the choice slightly to secondary difference between these two signs. However, I don't have any strong opinions on this.

Attachments

Change History

comment:1 Changed 3 years ago by markus

  • Cc markus added
  • Priority changed from assess to medium
  • Weeks set to 0.1
  • Component changed from data to data-collation

comment:2 Changed 3 years ago by mark

  • Status changed from new to closed
  • Resolution set to needs-more-info

The old line mentions the following:

U+0D4B ( ോ ) MALAYALAM VOWEL SIGN OO
U+0D4C ( ൌ ) MALAYALAM VOWEL SIGN AU
U+0D57 ( ൗ ) MALAYALAM AU LENGTH MARK

So we need to know what the base is.

comment:3 Changed 3 years ago by markus

  • Cc mark added
  • Status changed from closed to new
  • Resolution needs-more-info deleted
  • Data Xpath cldr/common/collation/ml.xml ldml/collations/collation/cr deleted
  • Reporter changed from cibu@… to cibu

source:trunk/common/collation/ml.xml

FractionalUCA.txt has

0D47 0D3E; [6D 91, 05, 05]
0D4B; [6D 91, 05, 05]

0D46 0D57; [6D 93, 05, 05]
0D4C; [6D 93, 05, 05]

0D57; [6D 95, 05, 05]

The current collation/ml line is &\u0D4B < \u0D57 <<< \u0D4C

It reorders both 0D57 and 0D4C after 0D4B, but in the root collation they already follow 0D4B. What Cibu said is that they should have a secondary difference, not tertiary (current data) nor primary (root), and that it does not matter which one of the two sorts first. Therefore, the minimal tailoring here should be &\u0D4C << \u0D57.

I think Cibu originally tried to do this but with 0D57 secondary-before 0D4C, and ran into IcuBug:6328. I recently confirmed that that would work too now with ICU 53.

Last edited 3 years ago by markus (previous) (diff)

comment:4 Changed 3 years ago by cibu@…

Agree with Markus.

The bottom line is: \u0D57 and \u0D4C should not have a primary difference as indicated in the DUCET:

0D4C ; [.2242.0020.0002] # MALAYALAM VOWEL SIGN AU
0D46 0D57 ; [.2242.0020.0002] # MALAYALAM VOWEL SIGN AU
0D57 ; [.2243.0020.0002] # MALAYALAM AU LENGTH MARK

Whether they should differ in secondary or tertiary is debatable, as I have described in the bug report. Probably, that does not matter much.

comment:5 Changed 3 years ago by emmons

  • Owner changed from anybody to markus
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 26rc

comment:6 Changed 3 years ago by markus

  • Cc fredrik, pedberg added
  • Status changed from assigned to reviewing
  • Review set to emmons

comment:7 Changed 3 years ago by emmons

  • Status changed from reviewing to closed
  • Resolution set to fixed

comment:8 Changed 3 years ago by markus

  • Phase set to rc
  • Milestone changed from 26rc to 26
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.