[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10825(new data)

Opened 5 weeks ago

Last modified 3 days ago

Change the ar locale to default to ASCII digits

Reported by: markus Owned by: anybody
Component: numbers Data Locale: ar
Phase: dvet Review:
Weeks: 0.2 Data Xpath:



Change ar to default to ASCII digits. While many Arabic-speaking users prefer native digits, all understand ASCII digits: They are in widespread usage even in countries that prefer native digits. This would maximize understanding when we don’t know a user’s country, or a user selects Arabic but declines to select a regional variant.


“ASCII digits” refers to 0123456789 (U+0030..U+0039) for this discussion, called “European digits” in the Unicode standard (or “Latn” in CLDR), although colloquially referred to as “Arabic digits” because they are derived from Arabic.

“Native Digits” refers to ٠١٢٣٤٥٦٧٨٩ (U+0660..U+0669) for this discussion, called “Arabic-Indic digits” in the Unicode standard (or “Arab” in CLDR).

Note that there are many other sets of digits used with other languages. For example, the Eastern Arabic-Indic digits common in Persian, Urdu, etc. This document is only about Arabic language locales (not other languages/locales written in Arabic script).

Status quo

Most Arabic-language locales default to Native digits: <defaultNumberingSystem>arab</defaultNumberingSystem>

A few Arabic-language locales in the Maghreb/North Africa (ar-DZ [Algeria], ar-EH [Western Sahara], ar-LY [Libya], ar-MA [Morocco], and ar-TN [Tunisia]) use ASCII digits in CLDR.

The default content locale for ar is ar-001. In likelySubtags, ar expands to ar-Arab-EG. Since Egypt customarily uses native digits, ar itself has <defaultNumberingSystem>arab</defaultNumberingSystem>.


Pivot the number system, by changing ar to use ASCII digits, and setting the sublocales so that their resolved locale data (after inheritance) remains the same as now.

After the pivot, change the number system of ar-001 to ASCII.

We should alert people to this change in the migration section of the CLDR draft release notes. Note that while in most cases the default content locale for a language matches the likely-subtags value, CLDR already (exceptionally) disconnects ar-001 from ar-EG. Any matching system already needs to match ar, ar-001, and ar-EG correctly if they want to support the current differences between ar and ar-EG.

In other words, no change in resolved data for explicit regional variants.



Millions of Arabic speakers are familiar with ASCII digits but are not familiar with Native digits

Maghreb countries (Morocco/Western Sahara, Algeria, Tunisia, Libya) represent ~90M speakers, roughly 1/3 of the Arabic-speaking world.

An informal survey of Arabic-speaking Google employees who grew up in Morocco indicated that only 25% could “easily read Eastern Arabic numbers ١٢٣٤٥٦٧٨”. While this survey was small (N=16) and unscientific, we anticipate that the general population fluency for native digits may be lower: many Googlers who claimed to easily read native digits cited international studies / work as the reason for their fluency (e.g. after growing up in Morocco, they later moved to Dubai, where native digits are more common).


ASCII digits are well-understood and commonly used throughout the Arabic-speaking world

While it’s clear that many users outside of (roughly) the Maghreb still prefer and use Native digits, various data (including surveys of printed newspapers, analysis of Google searches, and PDFs on various websites) suggest that ASCII digits are still very common across the Arabic-speaking world. It’s typical for a newspaper to have some of its content with Native digits, and some in ASCII digits (e.g. the date and sports scores might be Native but page numbers and numbers in news articles might be ASCII). Thus, showing ASCII digits when we don’t know a user’s full locale might annoy some users, but there’s no evidence that such users wouldn’t be able to understand those digits.

(By contrast, nearly all Maghrebi printed documents we’ve found seem to use exclusively ASCII digits.)


There may be a shift from Native digits to ASCII digits across the Arabic-speaking world

This is harder to measure precisely, but anecdotal evidence suggests that people across the Arabic-speaking world might be moving from Native to ASCII digits. For example, Bahrain switched its coins from native to ASCII in 1992, and Qatar did the same in 2016. Google Trends also provides some more anecdotal evidence of this.


On the Web, it is sometimes difficult to discern what style of digits someone intended to use, since Windows often will display ASCII digits as if they were Native digits… thus especially Desktop-centric web content that is ASCII may have been “intended” to be Native. Thus, the data above tends of focus on printed material, PDFs, and other documents for which we are more confident at what the writer intended. (This effect diminishes over time because more users use mobile devices which display ASCII digits as themselves.)

Note that many manufacturers allow users to override the default for their specific locale — for example, Apple’s iOS allows ar-EG users to explicitly request ASCII digits, or ar-DZ users to explicitly request Native digits. This proposal does not affect such overrides.


Change History

comment:1 Changed 4 weeks ago by srl

  • Xref set to 9839

comment:2 Changed 2 weeks ago by pedberg

  • Cc pedberg, fredrik added

comment:3 Changed 3 days ago by markus

FYI: Google overrides the numbering system to ASCII digits for phone numbers, financial data, and certain products – mostly specifically for Arabic, leaving default native digits for other languages.

Responding to concerns from 2018-jan-10 CLDR meeting:

Changes to locales such as ar-US and ar-TR

Our proposal is designed such that all current Arabic locales (e.g. ar-EG and ar-DZ) retain the same digit settings as they currently have (e.g. ar-EG remains native, ar-DZ remains ASCII). However, someone pointed out that some operating systems allow selection of locales such as ar-US or ar-TR, which would default to ar-001 and thus change from that status quo native to ASCII.

We believe that for most locales, this is the right thing to do: for example, Arabic speakers in the US come from a wide variety of countries, some of which are not familiar with native digits. Thus the arguments for changing ar-001 also apply to the US: it is "least bad" to show ASCII to such speakers, since everyone understands ASCII but those from the Maghreb won't understand native. Furthermore, Arabic speakers in the US (and other non-Arabic majority countries) are even more likely to be comfortable with (and likely even prefer) ASCII.

That being said, there are certainly some locales—the most obvious being ar-IR and ar-PK—where the non-Arabic speakers in the surrounding country customarily use Arabic native digits, and thus we should add a digit preference for native for such locales. (The specific style of digits in Iran & Pakistan is different, but it's closer to the Arabic native standard than ASCII.)

Evidence of government or customary use of ASCII digits outside the Maghreb

Here are some links that show examples of ASCII digits being used by both governments and other groups outside the Maghreb (we have tried to show PDF/print examples to eliminate the possibility of Windows' ASCII digits that look like native digits, but non-PDF/print examples abound as well):

To be clear: we are not suggesting that ASCII is *more* common (or preferred) than native digits outside the Maghreb, just that it is common enough that effectively all Arabic speakers are *familiar* with ASCII digits.


Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.