L2/17-377
Proposal for the encoding of three Arabic tanween characters
Eric Muller — Amazon
October 15, 2017
Background
In L2/15-329, Mussa A. A. Abudena proposes 6 characters (1 through 6) for the vowel signs and tanweens, as they are typically rendered in Qalun’s transmission of Nafi’s reading of the Quran.
I agree with the UTC that characters 1 through 3 are just glyphic variants damma, dammatan, and open dammatan. The UTC also suggested that the remaining three characters be represented by the combination of damma or fatha with U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM, and kasra with U+06ED ARABIC SMALL LOW MEEM.
I would like to argue that the representation suggested by the UTC is inadequate, and not just for the typical appearance of Qalun’s transmission of Nafi’s reading, but actually for all calligraphic styles of the Quran. I therefore propose to encode three additional tanween characters, to be used in all transmissions.
Sources
[kfgqpc-h] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Hafs’ transmission of Asim’s reading. This transmission is by far the most popular currently. Unicode text and font available at http://fonts.qurancomplex.gov.sa/?page_id=42 and images downloadable at http://dm.qurancomplex.gov.sa/download/.
[oman] Electronic Mushaf Muscat Calligraphy Project. Also uses Hafs’ transmission of Asim’s reading. Unicode representation is only available punctually. https://www.mushafmuscat.om/.
[kfgqpc-w] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshNarration as images only.
[tunisian] Tunisian edition of the Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshAsbahani). If you read Arabic, you can learn about the various tanweeen forms in pages 9-11 of the appendix.
[wics] edition of the Quran, published in 1989 by the World Islamic Call Society - Tripoli - Libya. Uses Qalun’s transmission of Nafi’s reading. Available at https://archive.org/details/Ms7FalGmaHRYaH as images only.
The Quran
Nazalization of the three short vowels fatha, kasra, and damma is ordinarily written using fathatan, kasratan and dammatan (the tanween).
In the Quran, three distinct pronunciations of the tanween are written by different signs:
|
fatha |
damma |
kasra |
ordinary vowel sign |
[kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6 |
[kfgqpc-h] 2:6
[tunisian] 2:6
[kfgqpc-w] 2:5
[wics] 2:5 |
[kfgqpc-h] 2:6
[tunisian] 2:6
[kfgqpc-w] 2:5
[wics] 2:5 |
ordinary tanween |
[kfgqpc-h] 2:182
[tunisian] 2:182
[kfgqpc-w] 2:181
[wics] 2:181 |
[kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6 |
[kfgqpc-h] 2:36
[tunisian] 2:36
[kfgqpc-w] 2:35
[wics] 2:35 |
open tanween |
[kfgqpc-h] 2:182
[tunisian] 2:182
[kfgqpc-w] 2:181
[wics] 2:181 |
[kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6 |
[kfgqpc-h] 2:36
[tunisian] 2:36
[kfgqpc-w] 2:35
[wics] 2:35 |
meem tanween |
[kfgqpc-h] 2:95
[tunisian] 2:95
[kfgqpc-w] 2:94
[wics] 2:94 |
[kfgqpc-h] 2:10
[tunisian] 2:10
[kfgqpc-w] 2:9
[wics] 2:9 |
[kfgqpc-h] 2:99
[tunisian] 2:99
[kfgqpc-w] 2:98
[wics] 2:98 |
The fragments shown above are the same parts of the text. Qalun’s and Warsh’s transmissions omit the muqatta'at at the beginning of the surah, hence the shift by one in ayah numbers.
Also relevant to the present discussion is the appearance of a small high meem isolated form used above a noon to indicate that it should be pronounced as a meem.
[kfgqpc-h] 2:27
[tunisian] 2:27
[kfgqpc-w] 2:26
[wics] 2:26
Number of occurrences in [kfgqpc-h]
|
fatha |
damma |
kasra |
vowel sign |
123,274 |
37,334 |
46,769 |
ordinary tanween |
734 |
578 |
606 |
open tanween |
2,901 |
1,807 |
1,935 |
meem tanween |
106 |
134 |
99 |
Representation in Unicode
|
fatha |
damma |
kasra |
vowel sign |
U+064E ARABIC FATHA, ccc=30 |
U+064F ARABIC DAMMA, ccc=31 |
U+0650 ARABIC KASRA, ccc=32 |
ordinary tanween |
U+064B ARABIC FATHATAN, ccc=27 |
U+064C ARABIC DAMMATAN, ccc=28 |
U+064D ARABIC KASRATAN, ccc=29 |
open tanween |
U+08F0 ARABIC OPEN FATHATAN, ccc=27 |
U+08F1 ARABIC OPEN DAMMATAN, ccc=28 |
U+08F2 ARABIC OPEN KASRATAN, ccc=29 |
[kfgqpc-h] actually hijacks U+0657, U+065E and U+0656 to represent and display the open tanween, presumably because the text was created before the encoding of U+08F0..08F2.
Representation of meen tanween
Both [kfgqpc-h] and [oman] use <U+064E ARABIC FATHA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem fathatan, and similarly <U+064F ARABIC DAMMA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem dammathan.
[kfgqpc-h] continues the pattern for meem kasratan: <U+0650 ARABIC KASRA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM>; whereas [oman] uses <U+0650 ARABIC KASRA, U+06ED ARABIC SMALL LOW MEEM>.
Those representations are not satisfactory.
- There is an asymetry between the three forms of tanween, since the first two forms are encoded atomically and the meem tanween isn’t.
- It is quite clear that meem modifies the vowel sign, not the base character, a situation that is not adequately represented by a combining mark on the base character.
- In the typical shapes used for Qalun’s transmission, the shape of the meem in the tanween is clearly distinct from the shape of meem above a noon. Thus it is akward to use the same character (U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM) in both cases.
- While the systematic use of U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM leads to a uniform pattern, it is problematic to use a character with ccc=230 above, while it is actually displayed below the base. The case of shadda + kasra, where the kasra can be displayed just below the shadda and above the base, is only partially a precedent, as shadda and, more importantly, kasra have fixed combining classes.
- Conversely, using different characters for the small meem depending on whether is attaches to fatha or damma on the one hand or kasra on the other hand is cumbersome at best.
Proposal to encode atomic meem tanween
Consequently, and given the precedent of the open tanween characters, it seems appropriate to encode three new atomic characters for meem tanween, with properties similar to those of the open tanween characters.
08D0; ARABIC MEEM FATHATAN;Mn;27;NSM;;;;;N;;;;;
08D1; ARABIC MEEM DAMMATAN;Mn;28;NSM;;;;;N;;;;;
08D2; ARABIC MEEM KASRATAN;Mn;29;NSM;;;;;N;;;;;
For the representative glyphs, I suggest to use shapes similar to
[kfgqpc-h] (built using the existing font).
Proposal Summary Form
ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
Please fill all the sections A, B and C below.
Please read Principles and Procedures Document (P & P) from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/principles.html
for guidelines and details before filling this form.
Please ensure you are using the latest Form from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html.
See also http://std.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html
for latest Roadmaps.
Form number: N4502-F ( Original 1994-10-14; Revised 1995-01, 1995-04, 1996-04, 1996-08, 1999-03, 2001-05, 2001-09, 2003-11, 2005-01, 2005-09,
2005-10, 2007-03, 2008-05, 2009-11, 2011-03, 2012-01)
|
A. Administrative
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.Title: |
Proposal for the encoding of three Arabic meem tanween characters
|
2. Requester's name: |
Eric Muller (emuller@amazon.com)
|
3. Requester type
(Member body/Liaison/Individual contribution): |
Individual contribution
|
4. Submission date: |
October 15, 2017
|
5. Requester's reference (if applicable): |
|
6. Choose one of the following: |
|
This is a complete proposal: |
YES
|
|
(or) More information will be provided later: |
|
|
B. Technical - General
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Choose one of the following: |
|
a. This proposal is for a new script
(set of characters): |
NO
|
|
Proposed name of script: |
|
|
b. The proposal is for addition of character(s)
to an existing block: |
YES
|
|
Name of the existing block: |
Arabic Extended-A
|
2. Number of characters in proposal: |
3
|
3. Proposed category (select one from below
- see section 2.2 of P&P document): |
A-Contemporary |
X
|
B.1-Specialized (small collection) |
X
|
B.2-Specialized (large collection) |
|
C-Major extinct |
|
D-Attested extinct |
|
E-Minor extinct |
|
F-Archaic Hieroglyphic or Ideographic |
|
|
G-Obscure or questionable usage symbols |
|
4. Is a repertoire including character
names provided? |
YES
|
|
a. If YES, are the
names in accordance with the "character naming
guidelines" |
YES
|
|
b. Are the character shapes
attached in a legible form suitable for review? |
YES
|
5. Fonts related: |
|
a. Who will provide
the appropriate computerized font to the Project Editor
of 10646 for publishing the standard? |
|
I suggest to build glyphs from the existing font, for consistency.
|
|
b. Identify the party
granting a license for use of the font by the editors
(include address, e-mail, ftp-site, etc.): |
|
|
6. References: |
|
a. Are references (to other character sets,
dictionaries, descriptive texts etc.) provided? |
NO
|
|
b. Are published examples of use
(such as samples from newspapers, magazines, or other sources) |
|
of proposed characters attached? |
YES
|
7. Special encoding issue |
|
Does the proposal address other aspects
of character data processing (if applicable) such as input, |
|
presentation, sorting, searching, indexing,
transliteration etc. (if yes please enclose information)? |
|
|
|
8. Submitters are
invited to provide any additional information about
Properties of the proposed Character(s) or Script that
will assist in correct understanding of and correct
linguistic processing of the proposed character(s) or
script. Examples of such properties are: Casing
information, Numeric information, Currency information,
Display behaviour information such as line breaks,
widths etc., Combining behaviour, Spacing behaviour,
Directional behaviour, Default Collation behaviour,
relevance in Mark Up contexts, Compatibility
equivalence and other Unicode normalization related
information. See the Unicode standard at http://www.unicode.org
for such information on other scripts. Also see UAX#44:
http://www.unicode.org/reports/tr44/
and associated Unicode Technical Reports for
information needed for consideration by the Unicode
Technical Committee for inclusion in the Unicode
Standard. |
|
C. Technical - Justification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Has this proposal for addition of character(s) been submitted before? |
NO
|
|
If YES explain |
|
2. Has contact been made to members of the
user community (for example: National Body, |
|
user groups of the script or characters,
other experts, etc.)? |
NO
|
|
If YES, available relevant documents: |
|
3. Information on the user community for
the proposed characters (for example: |
|
size, demographics, information technology
use, or publishing use) is included? |
|
|
Reference: |
|
4. The context of use for the proposed
characters type of use; common or rare) |
Common
|
|
Reference: |
|
5. Are the proposed
characters in current use by the user community? |
YES
|
|
If YES, where?
Reference: |
Quran
|
6. After giving due
considerations to the principles in the P&P
document must the proposed characters be entirely |
|
in the BMP? |
YES
|
|
If YES, is a rationale provided? |
NO
|
|
If Yes, reference: |
|
7. Should the proposed characters be
kept together in a contiguous range (rather than being scattered)? |
YES
|
8. Can any of the proposed characters
be considered a presentation form of an existing |
|
character or character sequence? |
YES
|
|
If YES, is a rationale for its
inclusion provided? |
YES
|
|
If Yes, reference: |
|
9. Can any of the proposed characters
be encoded using a composed character sequence of either |
|
existing characters or other proposed
characters? |
YES
|
|
If YES, is a rationale
for its inclusion provided? |
YES
|
|
If Yes, reference: |
|
10. Can any of the proposed
character(s) be considered to be similar (in appearance or function) |
|
to, or could be confused with,
an existing character? |
NO
|
|
If YES, is a rationale for its
inclusion provided? |
|
|
If Yes, reference: |
|
11. Does the proposal include
use of combining characters and/or use of composite sequences? |
YES
|
|
If YES, is a rationale for
such use provided? |
YES
|
|
If Yes, reference: |
|
|
Is a list of composite
sequences and their corresponding glyph images (graphic symbols) provided? |
|
|
If Yes, reference: |
|
12. Does the proposal contain
characters with any special properties such as |
|
control function or similar
semantics? |
NO
|
|
If YES, describe in detail
(include attachment if necessary) |
|
|
|
|
|
13. Does the proposal contain any
Ideographic compatibility characters? |
NO
|
|
If YES, are the equivalent corresponding
unified ideographic characters identified? |
|
|
If Yes, reference: |
|
|
|