[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #6782(closed: fixed)

Opened 5 years ago

Last modified 10 days ago

Kashmiri exemplar and data needs fixes

Reported by: mark Owned by: roozbeh
Component: exemplars-etc Data Locale: ks
Phase: dsub Review: pedberg
Weeks: Data Xpath:
Xref:

Description (last modified by mark) (diff)

Our Kashmiri data is using the wrong character in exemplars and data. Example:

اٮ۪لجیرِیا

Uses U+066E ( ‎ٮ‎ ) ARABIC LETTER DOTLESS BEH + U+06EA ( ۪ ) ARABIC EMPTY CENTRE LOW STOP
when it should be using U+067E ( ‎پ‎ ) ARABIC LETTER PEH
See http://www.loc.gov/catdir/cpso/romanization/kashmiri.pdf, others.

<yesstr>اۭں</yesstr> only one instance
Uses U+06ED ( ۭ ) ARABIC SMALL LOW MEEM, when I'm guessing what is meant is U+0625 ( ‎إ‎ ) ARABIC LETTER ALEF WITH HAMZA BELOW.

Similarly, the following seem to be incorrect:

[ೞෟ់-៑]

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[ೞෟ់-៑]&abb=on&g=sc+gc+subhead

Attachments

Change History

comment:1 Changed 5 years ago by mark

  • Description modified (diff)

comment:2 Changed 5 years ago by emmons

  • Status changed from new to assigned
  • Component changed from unknown to data
  • Priority changed from assess to major
  • Milestone changed from UNSCH to 25M1
  • Owner changed from anybody to roozbeh
  • type changed from unknown to enhancement

comment:3 Changed 5 years ago by roozbeh

The Kashmiri data looks terrible in its choice of characters. The exemplars are also pretty wrong. I assume somebody using a hacked font submitted them.

At least these should be removed from the exemplar:
U+06EA
U+06ED
U+0655 (pretty confidently, but not 100%. to be replaced by U+065F)
U+065A
U+065B

I don't know where the range with the Kannada, Sinhala, and Khmer characters come from. Is it from the Kashmiri file?

comment:4 Changed 5 years ago by emmons

  • Milestone changed from 25M1 to 25rc

Moving Roozbeh's 25M1 to 25rc

comment:5 Changed 5 years ago by emmons

  • Milestone changed from 25rc to 26rc

comment:6 Changed 4 years ago by mark

  • Milestone changed from 26rc to 27dsub

comment:7 Changed 4 years ago by markus

  • Phase set to dsub
  • Milestone changed from 27dsub to 27

comment:8 Changed 4 years ago by roozbeh

  • Milestone changed from 27 to 28

comment:9 Changed 4 years ago by roozbeh

  • Owner changed from roozbeh to shervin

comment:10 Changed 4 years ago by markus

  • type changed from enhancement to data

comment:11 Changed 4 years ago by srl

  • Status changed from assigned to accepted

comment:12 Changed 4 years ago by shervin

  • Milestone changed from 28 to 29

comment:13 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

Automatic move of all 29 -> upcoming

comment:14 Changed 5 weeks ago by roozbeh

  • Cc roozbeh added

comment:15 Changed 4 weeks ago by pedberg

  • Milestone changed from upcoming to UNSCH

CLDR 34 BRS closing item, move all upcoming → UNSCH

comment:16 Changed 3 weeks ago by mark

  • Component changed from main to other

comment:17 Changed 3 weeks ago by mark

  • Component changed from other to exemplars-etc

comment:18 Changed 2 weeks ago by roozbeh

  • Cc shervin added; roozbeh removed
  • Data Locale set to ks
  • Owner changed from shervin to roozbeh
  • Milestone changed from UNSCH to 35

comment:19 Changed 2 weeks ago by roozbeh

  • Summary changed from Other exemplar & data fixes to Kashmiri exemplar and data needs fixes

comment:20 Changed 13 days ago by roozbeh

Using this to cleanup Kashmiri exemplars and characters. Here's the list of things to do:

  1. Kashmiri uses arabext numbering system, but the arabext digits are not include in exemplarCharacters. I'll add them.
  2. I'll remove U+066E ARABIC LETTER DOTLESS BEH from the exemplar set. This is an archaic letter incorrectly used.
  3. I'll remove U+06EA ARABIC EMPTY CENTRE LOW STOP and U+06ED ARABIC SMALL LOW MEEM from the exemplar set. These are Quranic marks.
  4. I'll remove U+065A ARABIC VOWEL SIGN SMALL V ABOVE and U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE. These are vowels used for African languages. In Kashmiri, precomposed letters should be used.
  5. I'll move tashkil/harakat to auxiliary set.
  6. I'll add Kashmiri-specific characters U+0620 ARABIC LETTER KASHMIRI YEH and U+065F ARABIC WAVY HAMZA BELOW.
  7. It appears that U+06EA ARABIC EMPTY CENTRE LOW STOP has been used as a hack to create U+0620 ARABIC LETTER KASHMIRI YEH. Replacing all sequences of <U+066E ARABIC LETTER DOTLESS BEH, U+06EA ARABIC EMPTY CENTRE LOW STOP> with U+0620 ARABIC LETTER KASHMIRI YEH.
  8. It appears that the single use of U+06ED ARABIC SMALL LOW MEEM is in <yesstr>, اۭں, where it should be replaced with U+065F ARABIC WAVY HAMZA BELOW. Replacing.
  9. U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE is used over a few characters, specially Reh, Yeh, and Noon. There's not much information about what that may mean, except at https://en.wikipedia.org/wiki/Help:IPA/Kashmiri where it says it's used for nasalization (which explains Noon, but not Reh and Yeh). Since UTC hasn't sanctioned this usage for the character and I'm not sure which other character should be used, I'll drop all occurrences.
  10. U+065A ARABIC VOWEL SIGN SMALL V ABOVE is used over Waw and Yeh.
    • Over Waw, it's a well-known letter of the Kashmiri alphabet for the /o/ sound, as seen in the sources below. I replaced it with U+06C6 ARABIC LETTER OE and added it to the exemplars.
    • Over Yeh, there's contradictory information about what it is. Wikipedia has a V like shape over Yeh for the /e/ sound, while LOC uses a breve like sign. Omniglot has neither. Without more information, it's best to drop the mark.

Here are some sources I used:

comment:21 Changed 13 days ago by roozbeh

Regarding U+0655 in comment 3, the source seems to distinguish normal hamza below and wavy hamza below, so that doesn't seem to be a mistake.

comment:22 Changed 13 days ago by roozbeh

  • Status changed from accepted to reviewing
  • Review set to pedberg

comment:23 Changed 10 days ago by pedberg

  • Status changed from reviewing to closed
  • Resolution set to fixed

A lot of investigation and work here, thanks!

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.