[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10752(new unknown)

Opened 3 months ago

Last modified 3 weeks ago

i18n string concatenation

Reported by: sascha Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

It would be nice if CLDR could define rules and supporting data for i18n string concatenation.

  • Katakana plus Katakana gets joined by U+30FB KATAKANA MIDDLE DOT;
  • Katakana plus {Hiragana,Ideographs} gets joined by the empty string;
  • Hiragana plus {Katakana,Hiragana,Ideographs} gets joined by the empty string;
  • Ideographs plus {Katakana,Hiragana,Ideographs} gets joined by the empty string;
  • anything else gets joined by U+0020 SPACE.

For example:

  • アンゲラ ⊕ メルケル ⟶ アンゲラ・メルケル
  • 田中 ⊕ アンゲラ ⟶ 田中アンゲラ
  • 田中 ⊕ 一斉 ⟶ 田中一斉
  • အင်ဂျလာ ⊕ မာကယ် ⟶ အင်ဂျလာ မာကယ်
  • John ⊕ Doe ⟶ John Doe

This would be useful for systems that do automatic name transliteration, but also for other systems that need to concatenate strings. For example, when a text processor looks up translations for style names like “Semi-Bold”, “Condensed”, “Italic” which then get concatenated into a single string “Semi-Bold Condensed Italic”, the joining character (Space vs. Katakana middle dot vs. Nothing) depends on the neighboring characters.

Attachments

Change History

comment:1 Changed 3 weeks ago by mark

It needs a bit of fleshing out.

I think this might be more properly termed "word concatenation" than "string concatenation". For example, for string concatenation we have had requests for Hebrew, where an ending gets a - inserted between hebrew and non-hebrew letters (otherwise nothing). But for word concatenation, it would be a space.

In the very general case, word concatenation can get more complicated, eg "le" ⊕ "apostrophe" => l‘apostrophe. But if we spec'd this properly, as a basic concatenation, that would probably be ok to start with.

Would need a fleshed-out structure, like:

<concatenations>

<concatenation type='string' before="[:scx=kana:]" after="[:scx=kana:]">・</concatenation>

...

These would need to be ordered. Can you supply a draft DTD?

Most rules would be in the main/root.xml, but could be overridden by (say) fr.xml. The overriding mechanism would be to put the child elements before the parent elements (eg encountered first when evaluating).

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.