[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10092(closed data: fixed)

Opened 16 months ago

Last modified 8 months ago

Transliteration rules from German to ASCII

Reported by: ausi <martin@…> Owned by: sascha
Component: translit Data Locale: de
Phase: rc Review: mark
Weeks: Data Xpath:



To convert German text into ASCII (e.g. for generating URLs), there is currently only the transform „Latin-ASCII“ available where the diaeresis (umlaut) ä, ö and ü get converted to a, o and u respectively.

For the German language there are transliteration rules that define that ä, ö and ü should get translated to ae, oe and ue respectively. I found the following references of this rule:


When it is not possible to use the umlauts (for example, when using a restricted character set) the characters Ä, Ö, Ü, ä, ö, ü should be transcribed as Ae, Oe, Ue, ae, oe, ue respectively


When replacing umlaut characters with plain ASCII, use ae, oe, etc. for German language


German names containing umlauts (ä, ö, ü) and/or ß are spelled in the correct way in the non-machine-readable zone of the passport, but with simple vowel + E and/or SS in the machine-readable zone


The Austrian passport can (but does not always) contain a note in German, English, and French that AE / OE/ UE / SS are the common transcriptions for Ä / Ö / Ü / ß.

I created a new XML file for transforms/de-ASCII.xml which is attached to this ticket.


de-ASCII.xml (1.3 KB) - added by ausi <martin@…> 16 months ago.

Change History

Changed 16 months ago by ausi <martin@…>


comment:1 Changed 12 months ago by mark

  • Cc pedberg added
  • Owner changed from anybody to sascha
  • Status changed from new to accepted
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32

comment:2 Changed 12 months ago by sascha

  • Status changed from accepted to reviewing
  • Review set to mark

Hi Martin,

thanks for your contribution! Would you like to be listed on http://cldr.unicode.org/index/acknowledgments ?

I’ve made a few changes compared to your rules: Support input in normalization form NFD; transcribe ÄÖÜ to all-uppercase string before+after uppercase letter (so both “HÄ” and “ÄH” become “AE” instead of “Ae”); handle U+1E9E ẞ LATIN CAPITAL LETTER SHARP S. I’ve also added a couple of test cases. See the “Review 1 commits” link at the top of this bug.

— Sascha

comment:3 Changed 12 months ago by ausi <martin@…>

Hi Sascha,

Thank you for working on this ticket!

Yes, it would be great to be on the Acknowledgments list. My name is “Martin Auswöger”.

Regarding input normalization, I used the NFC() transform as it is done in the Latin-ASCII rules. But I see, supporting both forms might be better :)

“HÄ” and “ÄH” should already have worked because I used a following [:Lowercase:] to transform into “Ae” and otherwise into “AE”. With your change a single “Ä” gets transformed to “Ae” but I think it should be “AE” instead.

ß and ẞ are already handled by Latin-ASCII so they are not needed in the special transforms for DE I think.

The tests look great, but I think “Ä Ö Ü” should get transformed to “AE OE UE”.

At the end of your transform you used ::Any-ASCII instead of ::Latin-ASCII, does this work correctly? I tried to create a Transliterator for Any-ASCII with the ICU library but it said there are no rules available for Any-ASCII, only for Latin-ASCII.

— Martin

comment:4 Changed 12 months ago by sascha

Thank you, Martin! Done.

comment:5 Changed 12 months ago by sascha

  • Xref set to 10438

comment:6 Changed 8 months ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed

Add a comment

Modify Ticket

as closed
Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.