[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11343(new docs)

Opened 2 months ago

Last modified 2 months ago

Corrections to documentation: Exemplar Characters

Reported by: Marcel Schneider <charupdate@…> Owned by: anybody
Component: translation-guide Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

Given CLDR is used to create user interfaces, as opposed to other uses such as input checking or IDN validation, the documentation should clearly focus on that design goal. Actually it does not and is therefore misleading, raising issues with data submission. These issues can be eliminated by simply rewording part of the documentation. The following example shows how.

Information Hub for Linguists > Characters > Exemplar Characters

http://cldr.unicode.org/translation/characters#TOC-Exemplar-Characters

Category English Example Meaning Rewording Comment
standard a b c d e f g h i j k l m n o p q r s t u v w x y z The minimal characters required for your language (other than punctuation).

The test to see whether or not a letter belongs in the main set is based on whether it is acceptable in your language to always use spellings that avoid that character. […]
The non-punctuation characters used specifically in your language.

The test to see whether or not a character belongs in the main set is based on whether it is acceptable in publishing in your language to always use spellings that avoid that character. […]
“Minimal” and “acceptable” are ambiguous and depend on opinion. Eg in French, the letter Œ œ can be written as OE oe in draft style on a typewriter not featuring half advance, or on a computer under Latin-1. But in real writing, publishing and user interfaces, Œ œ is mandatory.
punctuation ‐ – — , ; : ! ? . … ‘ ' ’ ′ ″ “ " ” ( ) [ ] / @ & # § † ‡ * The punctuation characters customarily used with your language.

For example, compared to the English list, Arabic might remove ; , ? /, and add ؟ \ ، ؛.

Don't include purely math symbols such as +, =, ±, and so on.
The preferred punctuation used in publishing in your language.

For example, compared to the English list, Arabic might remove ; , ? /, and add ؟ \ ، ؛.

Don't include punctuation or symbols on an ASCII or math usage basis, such as @, #, _, +, =, ±, and so on.
The concept of “customary use” again is misleading and ill-focused because it includes draft style and typewriter-like orthographies in current use due to lack of updating keyboard layouts. Eg the single and double generic quotation marks APOSTROPHE and QUOTATION MARK are not preferred. They should be removed from the English example. Further, despite their use in e-mail addresses, AT SIGN and LOW LINE are not language-specific and do not belong in the expected set. NUMBER SIGN should be included only if its use is accepted in publishing in natural language in your locale. That is true eg in English, but not in French.

Attachments

Change History

comment:1 Changed 2 months ago by asmus@…

The use of # and @ should be considered separately - these are nowadays syntax characters that leak into actual publishing, based on concepts such as e-mail addresses and hashtags. Where such characters are used by non-programmers and in general documents/news, they are clearly needed in localized environments, though not language specific.

Perhaps a new category is needed.

comment:2 Changed 2 months ago by Marcel Schneider <charupdate@…>

Xref

ticket:11378 ASCII in CLDR exemplar punctuation: Quotation marks

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.