Technical Notes | |
Version | 1 |
Authors | Ken Whistler |
Date | 2005-07-15 |
This Version | http://www.unicode.org/notes/tn24/tn24-1.html |
Previous Version | n/a |
Latest Version | http://www.unicode.org/notes/tn24/ |
This technical note provides an example of how the Unicode character names list [Names] for Unicode, Version 4.1.0, may be translated into other languages. This translation is an American English translation.
This document is a Unicode Technical Note. Sole responsibility for its contents rests with the author(s). Publication does not imply any endorsement by the Unicode Consortium.
For information on Unicode Technical Notes, including criteria for acceptance, see https://www.unicode.org/notes/.
This technical note provides an example of how the Unicode character names list [Names] for Unicode, Version 4.1.0, may be translated into other languages. This translation is an American English translation.
The translated names list in the accompanying data files is provided only for informational purposes, and is not part of the Unicode Standard. The author has no intention of updating or maintaining translation to match future versions of the Unicode Standard, so people who use the file use it at their own risk.
The idea is to demonstrate, through example, how translation of the names list can work, to provide an informative list of information about Unicode characters, without having to match exactly the sometimes confusing normative Unicode character names in the standard. Such translations can be used beneficially, for example, in discussions about characters, or in a user interface, where the concern might be more about making sure that the person using the name is clear about the identity of the character at they know it, rather than needing to exactly match the normative character name in the standard.
The American English "translation" systematically converts Anglicisms such as FULL STOP and SOLIDUS to more recognizable American English terms PERIOD and SLASH, for example. It also changes such character standard oddities as CARON into the more recognizable term HACEK. Various corrections for known misspellings or other errors in the normative names are also applied, in the interest of providing American English terms that make as much sense as possible. Of course, many Unicode character names are for highly technical symbols:
U+22C9 LEFT NORMAL FACTOR SEMIDIRECT PRODUCT
or for characters in scripts that English speakers are typically not familiar with and using terms from other languages:
U+1939 LIMBU SIGN KEMPHRENG
No attempt is made to provide explanatory rewordings of such characters or to translate such script-specific language usage in character names into some analogous phrase in English, as it is unlikely that such rewordings or translations would actually help in identification of the characters.
Instead, the translation simply culls away irrelevant distractions for American English speakers that result from Anglicisms, standardese, and miscellaneous naming mistakes.
The accompanying text file contains the actual translated names list.
American English Translated Names List
The text file uses the same format and syntax conventions as [Names]. See [Format]. This means that, if desired, the translated names list can be manipulated with the same unibook utility program that can be used to view the untranslated names list.
Note that this text file is a plain text file, but for technical reasons is encoded in ISO/IEC 8859-1, Latin-1, rather than in UTF-8.
[FAQ] | Unicode Frequently Asked Questions http://www.unicode.org/faq/ For answers to common questions on technical issues. |
[Format] | Unicode 4.1.0 Names List documentation http://www.unicode.org/Public/4.1.0/ucd/NamesList.html |
[Names] | Unicode 4.1.0 Names List http://www.unicode.org/Public/4.1.0/ucd/NamesList.txt The Unicode 4.1.0 Names List file from which this translation is derived. |
[Versions] | Versions of the Unicode Standard http://www.unicode.org/standard/versions/ For details on the precise contents of each version of the Unicode Standard, and how to cite them. |
The following summarizes modifications from the previous version of this document.
1 | Initial version |
© 2005 Ken Whistler. This publication is protected by copyright, and permission must be obtained from the author and Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use.
Use of this publication is governed by the Unicode Terms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.