L2/00-134
ISO/IEC JTC1/SC2/WG2 N_____
DATE: 2000-04-07
DOC TYPE: |
Expert contribution |
TITLE: |
Proposal to Encode Urdu
Numbers to Remove Ambiguity in Current Standard |
SOURCE: |
Paul Nelson (Redmond, WA,
USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John
Clews (UK) |
PROJECT: |
|
STATUS: |
Proposal |
ACTION ID: |
FYI |
DUE DATE: |
-- |
DISTRIBUTION: |
Worldwide |
MEDIUM: |
Paper and web |
NO. OF PAGES: |
3 |
A. Administrative |
|
1. Title |
Proposal to Encode Urdu
Numbers to Remove Ambiguity in Current Standard. |
2. Requesters name |
Paul Nelson (Redmond, WA,
USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John
Clews (UK) . |
3. Requester type |
Expert request. |
4. Submission date |
1998-11-06 |
5. Requesters reference |
|
6a. Completion |
This is a complete proposal. |
6b. More information to be
provided? |
Only as required for
clarification. |
B. Technical General |
|
1a. New script? Name? |
No. |
1b. Addition of characters to
existing block? Name? |
Yes. Arabic. |
2. Number of characters |
10. |
3. Proposed category |
|
4. Proposed level of
implementation and rationale |
|
5a. Character names included
in proposal? |
Yes. |
5b. Character names in
accordance with guidelines? |
Yes. |
5c. Character shapes
reviewable? |
Yes. |
6a. Who will provide
computerized font? |
Paul Nelson. |
6b. Font currently available? |
Paul Nelson. |
6c. Font format? |
TrueType. |
7a. Are references (to other
character sets, dictionaries, descriptive texts, etc.) provided? |
Yes. |
7b. Are published examples
(such as samples from newspapers, magazines, or other sources) of use of proposed
characters attached? |
Yes. |
8. Does the proposal address
other aspects of character data processing? |
|
C. Technical Justification |
|
1. Contact with the user
community? |
Yes. Farhan is Director of Computer
Corp, leading Urdu software company for PCs. |
2. Information on the user
community? |
Native. |
3a. The context of use for
the proposed characters? |
The Urdu numerals are currently
assigned to the same locations as Farsi numerals. There are three ambiguous
cases where Farsi and Urdu numerals cannot be differentiated. |
3b. Reference |
|
4a. Proposed characters in
current use? |
Yes. |
4b. Where? |
Native speakers in Pakistan,
India and worldwide. |
5a. Characters should be
encoded entirely in BMP? |
Already in BMP and in accordance with Roadmap. |
5b. Rationale |
|
6. Should characters be kept
in a continuous range? |
Yes. This greatly facilitates computational usage. |
7a. Can the characters be
considered a presentation form of an existing character or character
sequence? |
No. |
7b. Where? |
|
7c. Reference |
|
8a. Can any of the characters
be considered to be similar (in appearance or function) to an existing
character? |
Yes. However, this proposal's
goal is to remove the ambiguity from current Unicode assignments. |
8b. Where? |
EXTENDED ARABIC-INDIC
DIGITS [06F0-06F9]. There are three ambiguous digits between Farsi and Urdu. |
8c. Reference |
|
9a. Combining characters or
use of composite sequences included? |
N/A. |
9b. List of composite
sequences and their corresponding glyph images provided? |
N/A. |
10. Characters with any
special properties such as control function, etc. included? |
No. |
D. SC2/WG2 AdministrativeTo be completed by SC2/WG2 |
|
1. Relevant SC 2/WG 2
document numbers: |
|
2. Status (list of meeting
number and corresponding action or disposition) |
|
3. Additional contact to user
communities, liaison organizations etc. |
|
4. Assigned category and
assigned priority/time frame |
|
Other Comments |
|
The Unicode Standard currently
has Urdu assigned to share the same numbers with Farsi (06f0-06f9 EXTENDED
ARABIC-INDIC DIGITS). This brings about an ambiguous situation when attempting
to represent Farsi and Urdu in plain text in the same document. The current
standard also makes it impossible to represent Farsi and Urdu number glyphs in
the same font. Three characters having different glyph outlines for Urdu and
Farsi cause this ambiguity. These characters are 06f4 (FOUR), 06f6 (SIX) and
06f7 (SEVEN). To resolve this problem, and allow Urdu number handling to be
computationally more efficient, we propose to encode Urdu numbers in a
contiguous range in the Arabic Block. There are three open contiguous areas in
which the Urdu number will fit: 0600-060B, 0610-061A, and 0656-065f.
The Urdu numbers should be
encoded with the glyphs shown below:
Proposed Unicode |
NAME |
0600 |
URDU DIGIT ZERO |
0601 |
URDU DIGIT ONE |
0602 |
URDU DIGIT TWO |
0603 |
URDU DIGIT THREE |
0604 |
URDU DIGIT FOUR |
0605 |
URDU DIGIT FIVE |
0606 |
URDU DIGIT SIX |
0607 |
URDU DIGIT SEVEN |
0608 |
URDU DIGIT EIGHT |
0609 |
URDU DIGIT NINE |