[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10092(closed data: fixed)

Opened 8 months ago

Last modified 12 days ago

Transliteration rules from German to ASCII

Reported by: ausi <martin@…> Owned by: sascha
Component: translit Data Locale: de
Phase: rc Review: mark
Weeks: Data Xpath:
Xref:

ticket:10438

Description

To convert German text into ASCII (e.g. for generating URLs), there is currently only the transform „Latin-ASCII“ available where the diaeresis (umlaut) ä, ö and ü get converted to a, o and u respectively.

For the German language there are transliteration rules that define that ä, ö and ü should get translated to ae, oe and ue respectively. I found the following references of this rule:

en.wikipedia.org/wiki/German_orthography#Umlaut_diacritic_usage

When it is not possible to use the umlauts (for example, when using a restricted character set) the characters Ä, Ö, Ü, ä, ö, ü should be transcribed as Ae, Oe, Ue, ae, oe, ue respectively

en.wikipedia.org/wiki/Diaeresis_(diacritic)

When replacing umlaut characters with plain ASCII, use ae, oe, etc. for German language

en.wikipedia.org/wiki/German_passport#Different_spellings_of_the_same_name_within_the_same_document

German names containing umlauts (ä, ö, ü) and/or ß are spelled in the correct way in the non-machine-readable zone of the passport, but with simple vowel + E and/or SS in the machine-readable zone

en.wikipedia.org/wiki/Austrian_passport#Different_spellings_of_the_same_name_within_the_same_document

The Austrian passport can (but does not always) contain a note in German, English, and French that AE / OE/ UE / SS are the common transcriptions for Ä / Ö / Ü / ß.

I created a new XML file for transforms/de-ASCII.xml which is attached to this ticket.

Attachments

de-ASCII.xml (1.3 KB) - added by ausi <martin@…> 8 months ago.
de-ASCII.xml

Change History

Changed 8 months ago by ausi <martin@…>

de-ASCII.xml

comment:1 Changed 4 months ago by mark

  • Cc pedberg added
  • Owner changed from anybody to sascha
  • Status changed from new to accepted
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32

comment:2 Changed 4 months ago by sascha

  • Status changed from accepted to reviewing
  • Review set to mark

Hi Martin,

thanks for your contribution! Would you like to be listed on http://cldr.unicode.org/index/acknowledgments ?

I’ve made a few changes compared to your rules: Support input in normalization form NFD; transcribe ÄÖÜ to all-uppercase string before+after uppercase letter (so both “HÄ” and “ÄH” become “AE” instead of “Ae”); handle U+1E9E ẞ LATIN CAPITAL LETTER SHARP S. I’ve also added a couple of test cases. See the “Review 1 commits” link at the top of this bug.

— Sascha

comment:3 Changed 4 months ago by ausi <martin@…>

Hi Sascha,

Thank you for working on this ticket!

Yes, it would be great to be on the Acknowledgments list. My name is “Martin Auswöger”.

Regarding input normalization, I used the NFC() transform as it is done in the Latin-ASCII rules. But I see, supporting both forms might be better :)

“HÄ” and “ÄH” should already have worked because I used a following [:Lowercase:] to transform into “Ae” and otherwise into “AE”. With your change a single “Ä” gets transformed to “Ae” but I think it should be “AE” instead.

ß and ẞ are already handled by Latin-ASCII so they are not needed in the special transforms for DE I think.

The tests look great, but I think “Ä Ö Ü” should get transformed to “AE OE UE”.

At the end of your transform you used ::Any-ASCII instead of ::Latin-ASCII, does this work correctly? I tried to create a Transliterator for Any-ASCII with the ICU library but it said there are no rules available for Any-ASCII, only for Latin-ASCII.

— Martin

comment:4 Changed 4 months ago by sascha

Thank you, Martin! Done.

comment:5 Changed 4 months ago by sascha

  • Xref set to 10438

comment:6 Changed 12 days ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.