[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11112(new data)

Opened 7 days ago

String start with letter alif (ا) should not be indexed under hamza (ء) when using both locale ur and ar

Reported by: vichang@… Owned by: anybody
Component: collation Data Locale: ur, ar
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

hamza (ء) used in Arabic and Urdu, but string start with letter alif (ا) should not be indexed under hamza (ء). It should be indexed under alif (ا).

It may be ICU bug, but it sounds more like locale issue than ICU issue, so I reported the issue here.

Disclaimer: I am not a native speaker of arabic nor urdu. But apparently, alif (ا) is commonly used in arabic.

Arabic Collator in ICU put alif (ا) and hamza (ء) into the same bucket, but Urdu Collator in ICU doesn't. If hamza (ء) should be in a different index, it could be a collation bug in Arabic. Here is the code to reproduce the issue.

===================================
Collator collator = Collator.getInstance(arabic);
collator.setStrength(Collator.PRIMARY); The strength level used for AlphabeticIndex
System.out.println(collator.compare("\u0621", "\u0627"));
0. same bucket for AlphabeticIndex
collator = Collator.getInstance(urdu);
collator.setStrength(Collator.PRIMARY); The strength level used for AlphabeticIndex
System.out.println(collator.compare("\u0621", "\u0627"));
1. different buckets for AlphabeticIndex
===================================

GoogleIssue:31034811

Attachments

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.