[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10825(closed: fixed)

Opened 11 months ago

Last modified 9 months ago

Change the ar locale to default to ASCII digits

Reported by: markus Owned by: mark
Component: numbers Data Locale: ar
Phase: rc Review: markus
Weeks: 0.2 Data Xpath:
Xref:

ticket:9839

Description

Change ar to default to ASCII digits. While many Arabic-speaking users prefer native digits, all understand ASCII digits: They are in widespread usage even in countries that prefer native digits. This would maximize understanding when we don’t know a user’s country, or a user selects Arabic but declines to select a regional variant.

Terminology

“ASCII digits” refers to 0123456789 (U+0030..U+0039) for this discussion, called “European digits” in the Unicode standard (or “Latn” in CLDR), although colloquially referred to as “Arabic digits” because they are derived from Arabic.

“Native Digits” refers to ٠١٢٣٤٥٦٧٨٩ (U+0660..U+0669) for this discussion, called “Arabic-Indic digits” in the Unicode standard (or “Arab” in CLDR).

Note that there are many other sets of digits used with other languages. For example, the Eastern Arabic-Indic digits common in Persian, Urdu, etc. This document is only about Arabic language locales (not other languages/locales written in Arabic script).

Status quo

Most Arabic-language locales default to Native digits: <defaultNumberingSystem>arab</defaultNumberingSystem>

A few Arabic-language locales in the Maghreb/North Africa (ar-DZ [Algeria], ar-EH [Western Sahara], ar-LY [Libya], ar-MA [Morocco], and ar-TN [Tunisia]) use ASCII digits in CLDR.

The default content locale for ar is ar-001. In likelySubtags, ar expands to ar-Arab-EG. Since Egypt customarily uses native digits, ar itself has <defaultNumberingSystem>arab</defaultNumberingSystem>.

Proposal

Pivot the number system, by changing ar to use ASCII digits, and setting the sublocales so that their resolved locale data (after inheritance) remains the same as now.

After the pivot, change the number system of ar-001 to ASCII.

We should alert people to this change in the migration section of the CLDR draft release notes. Note that while in most cases the default content locale for a language matches the likely-subtags value, CLDR already (exceptionally) disconnects ar-001 from ar-EG. Any matching system already needs to match ar, ar-001, and ar-EG correctly if they want to support the current differences between ar and ar-EG.

In other words, no change in resolved data for explicit regional variants.

Rationale

1.

Millions of Arabic speakers are familiar with ASCII digits but are not familiar with Native digits

Maghreb countries (Morocco/Western Sahara, Algeria, Tunisia, Libya) represent ~90M speakers, roughly 1/3 of the Arabic-speaking world.

An informal survey of Arabic-speaking Google employees who grew up in Morocco indicated that only 25% could “easily read Eastern Arabic numbers ١٢٣٤٥٦٧٨”. While this survey was small (N=16) and unscientific, we anticipate that the general population fluency for native digits may be lower: many Googlers who claimed to easily read native digits cited international studies / work as the reason for their fluency (e.g. after growing up in Morocco, they later moved to Dubai, where native digits are more common).

2.

ASCII digits are well-understood and commonly used throughout the Arabic-speaking world

While it’s clear that many users outside of (roughly) the Maghreb still prefer and use Native digits, various data (including surveys of printed newspapers, analysis of Google searches, and PDFs on various websites) suggest that ASCII digits are still very common across the Arabic-speaking world. It’s typical for a newspaper to have some of its content with Native digits, and some in ASCII digits (e.g. the date and sports scores might be Native but page numbers and numbers in news articles might be ASCII). Thus, showing ASCII digits when we don’t know a user’s full locale might annoy some users, but there’s no evidence that such users wouldn’t be able to understand those digits.

(By contrast, nearly all Maghrebi printed documents we’ve found seem to use exclusively ASCII digits.)

3.

There may be a shift from Native digits to ASCII digits across the Arabic-speaking world

This is harder to measure precisely, but anecdotal evidence suggests that people across the Arabic-speaking world might be moving from Native to ASCII digits. For example, Bahrain switched its coins from native to ASCII in 1992, and Qatar did the same in 2016. Google Trends also provides some more anecdotal evidence of this.

FYI

On the Web, it is sometimes difficult to discern what style of digits someone intended to use, since Windows often will display ASCII digits as if they were Native digits… thus especially Desktop-centric web content that is ASCII may have been “intended” to be Native. Thus, the data above tends of focus on printed material, PDFs, and other documents for which we are more confident at what the writer intended. (This effect diminishes over time because more users use mobile devices which display ASCII digits as themselves.)

Note that many manufacturers allow users to override the default for their specific locale — for example, Apple’s iOS allows ar-EG users to explicitly request ASCII digits, or ar-DZ users to explicitly request Native digits. This proposal does not affect such overrides.

Attachments

Change History

comment:1 Changed 11 months ago by srl

  • Xref set to 9839

comment:2 Changed 11 months ago by pedberg

  • Cc pedberg, fredrik added

comment:3 Changed 10 months ago by markus

FYI: Google overrides the numbering system to ASCII digits for phone numbers, financial data, and certain products – mostly specifically for Arabic, leaving default native digits for other languages.

Responding to concerns from 2018-jan-10 CLDR meeting:

Changes to locales such as ar-US and ar-TR

Our proposal is designed such that all current Arabic locales (e.g. ar-EG and ar-DZ) retain the same digit settings as they currently have (e.g. ar-EG remains native, ar-DZ remains ASCII). However, someone pointed out that some operating systems allow selection of locales such as ar-US or ar-TR, which would default to ar-001 and thus change from that status quo native to ASCII.

We believe that for most locales, this is the right thing to do: for example, Arabic speakers in the US come from a wide variety of countries, some of which are not familiar with native digits. Thus the arguments for changing ar-001 also apply to the US: it is "least bad" to show ASCII to such speakers, since everyone understands ASCII but those from the Maghreb won't understand native. Furthermore, Arabic speakers in the US (and other non-Arabic majority countries) are even more likely to be comfortable with (and likely even prefer) ASCII.

That being said, there are certainly some locales—the most obvious being ar-IR and ar-PK—where the non-Arabic speakers in the surrounding country customarily use Arabic native digits, and thus we should add a digit preference for native for such locales. (The specific style of digits in Iran & Pakistan is different, but it's closer to the Arabic native standard than ASCII.)

Evidence of government or customary use of ASCII digits outside the Maghreb

Here are some links that show examples of ASCII digits being used by both governments and other groups outside the Maghreb (we have tried to show PDF/print examples to eliminate the possibility of Windows' ASCII digits that look like native digits, but non-PDF/print examples abound as well):

To be clear: we are not suggesting that ASCII is *more* common (or preferred) than native digits outside the Maghreb, just that it is common enough that effectively all Arabic speakers are *familiar* with ASCII digits.

comment:4 Changed 10 months ago by markus

Consensus from today:

  • We have specific settings in all ar-* locales. (match the current resolved settings)
    • add <defaultNumberingSystem>arab</…> to ar-EG, etc.
  • In ar, we have <defaultNumberingSystem>arab</…> (same as today)
  • And add: <defaultNumberingSystem alt=”latndigi”>latn</…> → Google (and other companies like Netflix) which want to default non-country-specific Arabic to ASCII can use this...we can then test it out, see how it works, and based on that experience see if we should switch default and alt
  • Publicize in release: make clear that we are considering change in the future, and ask people to start testing.
  • Stock ICU uses “standard” variant for ar.xml (Arab); Google etc will filter the data so that ar.xml gets <defaultNumberingSystem>Latn</…>
  • Ensure that cldrModify doesn’t remove sublocale settings when minimizing. Mark to file ticket. Kristi to add comments if needed.
  • Make test code in ICU handle both cases. IcuBug:13567
  • Will swap if agreed, in near-future release
  • AGREED for this release (33).

comment:5 Changed 9 months ago by mark

  • Owner changed from anybody to mark
  • Phase changed from dvet to rc
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 33

comment:6 Changed 9 months ago by mark

Note: used 'latn' instead of 'latndigi', since the former is the one in number.xml.

comment:7 Changed 9 months ago by mark

  • Status changed from accepted to reviewing
  • Review set to markus

Added explicit default numbering systems to all the arabic locales, the alt value for ar.xml, and fixed CLDRModify to not nuke the new values.

comment:8 follow-up: ↓ 9 Changed 9 months ago by markus

  • Status changed from reviewing to reviewfeedback

Please change <defaultNumberingSystem alt='latn'>arab</defaultNumberingSystem> to <defaultNumberingSystem alt='latn'>latn</defaultNumberingSystem> -- so that the alt value is actually different...

You wrote alt='latndigi' in the meeting minutes. I assume that the alt-attribute value is arbitrary. alt='latn' works for me. (What's not arbitrary is the element value, of course.)

You accidentally checked in debug code in TestLdml2ICU.java; please revert that.

ldml2icu_locale.txt: Since ICU won't read the /NumberElements/default_latn value, we don't need to get it converted -- but it's harmless, so I don't feel strongly.

comment:9 in reply to: ↑ 8 Changed 9 months ago by mark

  • Status changed from reviewfeedback to reviewing

Replying to markus:

Please change <defaultNumberingSystem alt='latn'>arab</defaultNumberingSystem> to <defaultNumberingSystem alt='latn'>latn</defaultNumberingSystem> -- so that the alt value is actually different...

Doooh. Fixed

You wrote alt='latndigi' in the meeting minutes. I assume that the alt-attribute value is arbitrary. alt='latn' works for me. (What's not arbitrary is the element value, of course.)

Right. I changed that because we had intended to match the BCP47 value (but didn't get it right in the meeting).

You accidentally checked in debug code in TestLdml2ICU.java; please revert that.

Changed to:

private static final boolean DEBUG = false;

ldml2icu_locale.txt: Since ICU won't read the /NumberElements/default_latn value, we don't need to get it converted -- but it's harmless, so I don't feel strongly.

right.

comment:10 Changed 9 months ago by markus

  • Status changed from reviewing to closed
  • Resolution set to fixed
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.