Proposal for the encoding of three Arabic meem tanween characters

In L2/15-329, Mussa A. A. Abudena proposes 6 characters (1 through 6) for the vowel signs and tanweens, as they are typically rendered in Qalun’s transmission of Nafi’s reading of the Quran.

I agree with the UTC that characters 1 through 3 are just glyphic variants damma, dammatan, and open dammatan. The UTC also suggested that the remaining three characters be represented by the combination of damma or fatha with U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM, and kasra with U+06ED ARABIC SMALL LOW MEEM.

I would like to argue that the representation suggested by the UTC is inadequate, and not just for the typical appearance of Qalun’s transmission of Nafi’s reading, but actually for all calligraphic styles of the Quran. I therefore propose to encode three additional tanween characters, to be used in all transmissions.

Sources

[kfgqpc-h] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Hafs’ transmission of Asim’s reading. This transmission is by far the most popular currently. Unicode text and font available at http://fonts.qurancomplex.gov.sa/?page_id=42 and images downloadable at http://dm.qurancomplex.gov.sa/download/.

[oman] Electronic Mushaf Muscat Calligraphy Project. Also uses Hafs’ transmission of Asim’s reading. Unicode representation is only available punctually. https://www.mushafmuscat.om/.

[kfgqpc-w] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshNarration as images only.

[tunisian] Tunisian edition of the Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshAsbahani). If you read Arabic, you can learn about the various tanweeen forms in pages 9-11 of the appendix.

[wics] edition of the Quran, published in 1989 by the World Islamic Call Society - Tripoli - Libya. Uses Qalun’s transmission of Nafi’s reading. Available at https://archive.org/details/Ms7FalGmaHRYaH as images only.

The Quran

Nazalization of the three short vowels fatha, kasra, and damma is ordinarily written using fathatan, kasratan and dammatan (the tanween).

In the Quran, three distinct pronunciations of the tanween are written by different signs:

The fragments shown above are the same parts of the text. Qalun’s and Warsh’s transmissions omit the muqatta'at at the beginning of the surah, hence the shift by one in ayah numbers.

Also relevant to the present discussion is the appearance of a small high meem isolated form used above a noon to indicate that it should be pronounced as a meem.

Number of occurrences in [kfgqpc-h]

Representation in Unicode

[kfgqpc-h] actually hijacks U+0657, U+065E and U+0656 to represent and display the open tanween, presumably because the text was created before the encoding of U+08F0..08F2.

Representation of meen tanween

Both [kfgqpc-h] and [oman] use <U+064E ARABIC FATHA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem fathatan, and similarly <U+064F ARABIC DAMMA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem dammathan.

[kfgqpc-h] continues the pattern for meem kasratan: <U+0650 ARABIC KASRA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM>; whereas [oman] uses <U+0650 ARABIC KASRA, U+06ED ARABIC SMALL LOW MEEM>.

Proposal to encode atomic meem tanween

Consequently, and given the precedent of the open tanween characters, it seems appropriate to encode three new atomic characters for meem tanween, with properties similar to those of the open tanween characters.

For the representative glyphs, I suggest to use shapes similar to [kfgqpc-h] (built using the existing font).

Proposal Summary Form

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
Please fill all the sections A, B and C below.
Please read Principles and Procedures Document (P & P) from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/principles.html for guidelines and details before filling this form.
Please ensure you are using the latest Form from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html.
See also http://std.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html for latest Roadmaps.
Form number: N4502-F ( Original 1994-10-14; Revised 1995-01, 1995-04, 1996-04, 1996-08, 1999-03, 2001-05, 2001-09, 2003-11, 2005-01, 2005-09, 2005-10, 2007-03, 2008-05, 2009-11, 2011-03, 2012-01)

A. Administrative

1.Title:

Proposal for the encoding of three Arabic meem tanween characters

2. Requester's name:

Eric Muller (emuller@amazon.com)

3. Requester type (Member body/Liaison/Individual contribution):

Individual contribution

4. Submission date:

October 15, 2017

5. Requester's reference (if applicable):

6. Choose one of the following:

This is a complete proposal:

YES

(or) More information will be provided later:

B. Technical - General

1. Choose one of the following:

a. This proposal is for a new script (set of characters):

Proposed name of script:

b. The proposal is for addition of character(s) to an existing block:

YES

Name of the existing block:

Arabic Extended-A

2. Number of characters in proposal:

3. Proposed category (select one from below - see section 2.2 of P&P document):

A-Contemporary

B.1-Specialized (small collection)

B.2-Specialized (large collection)

C-Major extinct

D-Attested extinct

E-Minor extinct

F-Archaic Hieroglyphic or Ideographic

G-Obscure or questionable usage symbols

4. Is a repertoire including character names provided?

YES

a. If YES, are the names in accordance with the "character naming guidelines"

YES

b. Are the character shapes attached in a legible form suitable for review?

YES

5. Fonts related:

a. Who will provide the appropriate computerized font to the Project Editor of 10646 for publishing the standard?

I suggest to build glyphs from the existing font, for consistency.

b. Identify the party granting a license for use of the font by the editors (include address, e-mail, ftp-site, etc.):

6. References:

a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided?

b. Are published examples of use (such as samples from newspapers, magazines, or other sources)

of proposed characters attached?

YES

7. Special encoding issue

Does the proposal address other aspects of character data processing (if applicable) such as input,

presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)?

8. Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see UAX#44: http://www.unicode.org/reports/tr44/ and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard.

C. Technical - Justification

1. Has this proposal for addition of character(s) been submitted before?

If YES explain

2. Has contact been made to members of the user community (for example: National Body,

user groups of the script or characters, other experts, etc.)?

If YES, available relevant documents:

3. Information on the user community for the proposed characters (for example:

size, demographics, information technology use, or publishing use) is included?

Reference:

4. The context of use for the proposed characters type of use; common or rare)

Common

Reference:

5. Are the proposed characters in current use by the user community?

YES

If YES, where? Reference:

Quran

6. After giving due considerations to the principles in the P&P document must the proposed characters be entirely

in the BMP?

YES

If YES, is a rationale provided?

If Yes, reference:

7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)?

YES

8. Can any of the proposed characters be considered a presentation form of an existing

character or character sequence?

YES

If YES, is a rationale for its inclusion provided?

YES

If Yes, reference:

9. Can any of the proposed characters be encoded using a composed character sequence of either

existing characters or other proposed characters?

YES

If YES, is a rationale for its inclusion provided?

YES

If Yes, reference:

10. Can any of the proposed character(s) be considered to be similar (in appearance or function)

to, or could be confused with, an existing character?

If YES, is a rationale for its inclusion provided?

If Yes, reference:

11. Does the proposal include use of combining characters and/or use of composite sequences?

YES

If YES, is a rationale for such use provided?

YES

If Yes, reference:

Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided?

If Yes, reference:

12. Does the proposal contain characters with any special properties such as

control function or similar semantics?

If YES, describe in detail (include attachment if necessary)

13. Does the proposal contain any Ideographic compatibility characters?

If YES, are the equivalent corresponding unified ideographic characters identified?

If Yes, reference:

	fatha	damma	kasra
ordinary vowel sign	[kfgqpc-h] 2:7 [tunisian] 2:7 [kfgqpc-w] 2:6 [wics] 2:6	[kfgqpc-h] 2:6 [tunisian] 2:6 [kfgqpc-w] 2:5 [wics] 2:5	[kfgqpc-h] 2:6 [tunisian] 2:6 [kfgqpc-w] 2:5 [wics] 2:5
ordinary tanween	[kfgqpc-h] 2:182 [tunisian] 2:182 [kfgqpc-w] 2:181 [wics] 2:181	[kfgqpc-h] 2:7 [tunisian] 2:7 [kfgqpc-w] 2:6 [wics] 2:6	[kfgqpc-h] 2:36 [tunisian] 2:36 [kfgqpc-w] 2:35 [wics] 2:35
open tanween	[kfgqpc-h] 2:182 [tunisian] 2:182 [kfgqpc-w] 2:181 [wics] 2:181	[kfgqpc-h] 2:7 [tunisian] 2:7 [kfgqpc-w] 2:6 [wics] 2:6	[kfgqpc-h] 2:36 [tunisian] 2:36 [kfgqpc-w] 2:35 [wics] 2:35
meem tanween	[kfgqpc-h] 2:95 [tunisian] 2:95 [kfgqpc-w] 2:94 [wics] 2:94	[kfgqpc-h] 2:10 [tunisian] 2:10 [kfgqpc-w] 2:9 [wics] 2:9	[kfgqpc-h] 2:99 [tunisian] 2:99 [kfgqpc-w] 2:98 [wics] 2:98

	fatha	damma	kasra
vowel sign	123,274	37,334	46,769
ordinary tanween	734	578	606
open tanween	2,901	1,807	1,935
meem tanween	106	134	99

	fatha	damma	kasra
vowel sign	U+064E ARABIC FATHA, ccc=30	U+064F ARABIC DAMMA, ccc=31	U+0650 ARABIC KASRA, ccc=32
ordinary tanween	U+064B ARABIC FATHATAN, ccc=27	U+064C ARABIC DAMMATAN, ccc=28	U+064D ARABIC KASRATAN, ccc=29
open tanween	U+08F0 ARABIC OPEN FATHATAN, ccc=27	U+08F1 ARABIC OPEN DAMMATAN, ccc=28	U+08F2 ARABIC OPEN KASRATAN, ccc=29

L2/17-377

Proposal for the encoding of three Arabic tanween characters

Eric Muller — Amazon

October 15, 2017

Background

Sources

The Quran

Number of occurrences in [kfgqpc-h]

Representation in Unicode

Representation of meen tanween

Proposal to encode atomic meem tanween

Proposal Summary Form