Chapter 15

South and Central Asia-IV

Other Historic Scripts

This chapter documents other modern and historic scripts of South and Central Asia.

Most of these scripts are historically related to the other scripts of India, and most are ultimately derived from the Brahmi script. None of them were standardized in ISCII. The encoding for each script is done on its own terms, and the blocks do not make use of a common pattern for the layout of code points.

This introduction briefly identifies each script, occasionally highlighting the most salient distinctive attributes of the script. Details are provided in the individual block descriptions that follow.

Syloti Nagri is used to write the modern Sylheti language of northeast Bangladesh and southeast Assam in India.

Kaithi is a historic North Indian script, closely related to the Devanagari and Gujarati scripts. It was used in the area of the present-day states of Bihar and Uttar Pradesh in northern India, from the 16th century until the early 20th century.

Sharada is a historical script that was used to write Sanskrit, Kashmiri, and other languages of northern South Asia; it was the principal inscriptional and literary script of Kashmir from the 8th century CE until the 20th century. It has limited and specialized modern use.

Takri, descended from Sharada, is used in northern India and surrounding countries. It is the traditional writing system for the Chambeali and Dogri languages, as well as several “Pahari” languages. In addition to popular usage for commercial and informal purposes, Takri served as the official script of several princely states of northern and northwestern India from the 17th century until the middle of the 20th century.

During the 17th century, the Brahmi-based Dogra script was used to write the Dogri language in Jammu and Kashmir in the northern region of the Indian subcontinent. The Dogra script was standardized in the 1860s, and is closely related to the Takri script. Dogri is now usually written with the Devanagari script.

Siddham is another Brahmi-based writing system related to Sharada, and structurally similar to Devanagari. It originated in India, and was used across South, Central, and East Asia, and is presently predominantly used in East Asia. Originally used for writing Buddhist manuscripts, the script is still used by Japanese Buddhist communities.

Mahajani is a Brahmi-based alphabet commonly used by bankers and money lenders across northern India until the middle of the 20th century. It is a specialized commercial script used for writing accounts and financial records. Mahajani has similarities to Landa, Kaithi, and Devanagari.

Khojki is a writing system used by the Nizari Ismaili community of South Asia for recording religious literature. It is one of two Landa scripts—the other being Gurmukhi—that were developed into formal liturgical scripts for use by religious communities. It is still used today.

Khudawadi is a Landa-based script that was used to write the Sindhi language spoken in India and Pakistan. It is related to Sharada. Known as the shopkeeper and merchant script, it was used for routine writing, accounting, and other commercial purposes.

The Multani script was used write the Seraiki language of eastern and southeastern Pakistan during the 19th and 20th centuries. Multani is related to Gurmukhi and more distantly related to Khudawadi and Khojki. It was used for routine writing and commercial activities.

Tirhuta, another Brahmi-based script, is related to the Bengali, Newari, and Oriya scripts. Tirhuta was the traditional writing system for the Maithili language, which is spoken by more than 35 million people in parts of India and Nepal. Maithili is an official regional language of India and the second most spoken language in Nepal.

Modi is another Brahmi-based script mainly used to write Marathi, a language spoken in western and central India. It emerged in the 16th century and derives from the Nagari scripts. It is still used some today.

Nandinagari is a Brahmi-based abugida that was used in southern India between the 11th and 19th centuries for manuscripts and inscriptions in Sanskrit. It is related to Devanagari. The script was also used for writing Kannada in Karnataka.

Grantha, a script with a long history, is used to write the Sanskrit language in parts of South India, Sri Lanka and elsewhere. It is in daily use by Vedic scholars and Hindu temple priests.

Tulu-Tigalari is a historic script attested in a large number of manuscripts from Karnataka and northern Kerala dating to as early as 1300 CE. It was used to write Sanskrit, Tulu, and Malayalam, but most attestations are manuscripts of Sanskrit religious texts written by Shivalli, Havyaka, and Kota brahmins. The script is known by a wide variety of names. It is currently undergoing revival among Tulu speakers in Karnataka, with some innovations, as a modern writing system alternative to the Kannada script for that language.

Dives Akuru is a Brahmi-derived script used to write the Dhivehi language on the Maldives from the 9th to the 20th centuries. The script is most closely related to a medieval form of the Sinhala script.

Ahom is a script of northeast India that dates to about the 16th century and was used primarily to write the Tai Ahom language. The script has seen a revival in the 20th century, and continues in some use today.

Sora Sompeng is used to write the Sora language spoken by the Sora people, who live in eastern India between the Oriya- and Telugu-speaking populations. The script was created in 1936 and is used in religious contexts.

#15.1 Syloti Nagri

#15.1.1 Syloti Nagri: U+A800–U+A82F

Syloti Nagri is a lesser-known Brahmi-derived script used for writing the Sylheti language. Sylheti is an Indo-European language spoken by some 5 million speakers in the Barak Valley region of northeast Bangladesh and southeast Assam in India. Worldwide there may be as many as 10 million speakers. Sylheti has commonly been regarded as a dialect of Bengali, with which it shares a high proportion of vocabulary.

The Syloti Nagri script has 27 consonant letters with an inherent vowel of /o/ and 5 independent vowel letters. There are 5 dependent vowel signs that are attached to a consonant letter. Unlike Devanagari, there are no vowel signs that appear to the left of their associated consonant.

Only two proper diacritics are encoded to support Syloti Nagri: anusvara and hasanta. Aside from its traditional Indic designation, anusvara can also be considered a final form for the sequence /-ng/, which does not have a base glyph in Syloti Nagri because it does not occur in other positions. Anusvara can also occur with the vowels U+A824 ꠤ SYLOTI NAGRI VOWEL SIGN I and U+A826 ꠦ SYLOTI NAGRI VOWEL SIGN E, creating a potential problem with the display of both items. It is recommended that anusvara always occur in sequence after any vowel signs, as a final character.

#Virama and Conjuncts. Conjuncts are not always necessary in contexts involving a dead consonant, nor are they limited to sequences involving dead consonants. They can also represent a variety of vowel + consonant (VC) syllables, such as ar, al, as, at, ir, and it, as well as the CCV combinations typical of other Indic scripts. In practice, it is rare to overtly indicate a dead consonant with an explicit hasanta, and not always obligatory to use a conjunct.

U+A806 SYLOTI NAGRI SIGN HASANTA, whose glyph is shaped like a circumflex, was introduced into the script relatively recently and is used in limited contexts. The character appears overtly in pedagogical materials introducing readers to the script. More commonly, the hasanta is inserted between consonants to represent a conjunct. Occasionally, it indicates a word-final consonant whose vowel is silenced; however, the hasanta is generally not required in such cases. A second hasanta, U+A82C SYLOTI NAGRI SIGN ALTERNATE HASANTA, specifically indicates a word-final consonant. The glyph for the alternate hasanta, resembles U+09CD BENGALI SIGN VIRAMA and is used when the glyph for circumflex-shaped hasanta would overhang the following space. The alternate hasanta has very limited modern-day use.

#Digits. There are no unique Syloti Nagri digits. When digits do appear in Syloti Nagri texts, they are generally Bengali forms. Any font designed to support Syloti Nagri should include the Bengali digits because there is no guarantee that they would otherwise exist in a user’s computing environment. They should use the corresponding Bengali block code points, U+09E6..U+09EF.

#Punctuation. With the advent of digital type and the modernization of the Syloti Nagri script, one can expect to find all of the traditional punctuation marks borrowed from the Latin typography: period, comma, colon, semicolon, question mark, and so on. In addition, the Devanagari single danda and double danda are used with great frequency.

#Poetry Marks. Four native poetry marks are included in the Syloti Nagri block. The script also makes use of U+2055 ⁕ FLOWER PUNCTUATION MARK (in the General Punctuation block) as a poetry mark.

#15.2 Kaithi

#15.2.1 Kaithi: U+11080–U+110CF

Kaithi, properly transliterated Kaithī, is a North Indian script, related to the Devanagari and Gujarati scripts. It was used in the area of the present-day states of Bihar and Uttar Pradesh in northern India.

Kaithi was employed for administrative purposes, commercial transactions, correspondence, and personal records, as well as to write religious and literary materials. As a means of administrative communication, the script was in use at least from the 16th century until the early 20th century, when it was eventually eclipsed by Devanagari. Kaithi was used to write Bhojpuri, Magahi, Awadhi, Maithili, Urdu, and other languages related to Hindi.

#Standards. There is no preexisting character encoding standard for the Kaithi script. The repertoire encoded in this block is based on the standard form of Kaithi developed by the British government of Bihar and the British provinces of northwest India in the 19th century. A few additional Kaithi characters found in manuscripts, printed books, alphabet charts, and other inventories of the script are also included.

#Styles. There are three presentation styles of the Kaithi script, each generally associated with a different language: Bhojpuri, Magahi, or Maithili. The Magahi style was adopted for official purposes in the state of Bihar, and is the basis for the representative glyphs in the code charts.

#Rendering Behavior. Kaithi is a Brahmi-derived script closely related to Devanagari. In general, the rules for Devanagari rendering apply to Kaithi as well. For more information, see Section 12.1, Devanagari.

#Vowel Letters. An independent Kaithi letter for vocalic r is represented by the consonant-vowel combination: U+110A9 KAITHI LETTER RA and U+110B2 KAITHI VOWEL SIGN II.

In print, the distinction between short and long forms of i and u is maintained. However, in handwritten text, there is a tendency to use the long vowels for both lengths.

#Consonant Conjuncts. Consonant clusters were handled in various ways in Kaithi. Some spoken languages that used the Kaithi script simplified clusters by inserting a vowel between the consonants, or through metathesis. When no such simplification occurred, conjuncts were represented in different ways: by ligatures, as the combination of the half-form of the first consonant and the following consonant, with an explicit virama (U+110B9 KAITHI SIGN VIRAMA) between two consonants, or as two consonants without a virama.

Consonant conjuncts in Kaithi are represented with a virama between the two consonants in the conjunct. For example, the ordinary representation of the conjunct mba would be by the sequence:

U+110A7 KAITHI LETTER MA + U+110B9 KAITHI SIGN VIRAMA + U+110A5 KAITHI LETTER BA

Consonant conjuncts may be rendered in distinct ways. Where there is a need to render conjuncts in the exact form as they appear in a particular source document, U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER can be used to request the appropriate presentation by the rendering system. For example, to display the explicitly ligated glyph 𑂧𑂹‍𑂥 for the conjunct mba, U+200D ZERO WIDTH JOINER is inserted after the virama:

U+110A7 KAITHI LETTER MA + U+110B9 KAITHI SIGN VIRAMA + U+200D ZERO WIDTH JOINER + U+110A5 KAITHI LETTER BA

To block use of a ligated glyph for the conjunct, and instead to display the conjunct with an explicit virama, U+200C ZERO WIDTH NON-JOINER is inserted after the virama:

U+110A7 KAITHI LETTER MA + U+110B9 KAITHI SIGN VIRAMA + U+200C ZERO WIDTH NON-JOINER + U+110A5 KAITHI LETTER BA

Conjuncts composed of a nasal and a consonant may be written either as a ligature with the half-form of the appropriate class nasal letter, or the full form of the nasal letter with an explicit virama (U+110B9 KAITHI SIGN VIRAMA) and consonant. In Grierson’s Linguistic Survey of India, however, U+110A2 KAITHI LETTER NA is used for all articulation classes, both in ligatures and when the full form of the nasal appears with the virama.

#Ruled Lines. Kaithi, unlike Devanagari, does not employ a headstroke. While several manuscripts and books show a headstroke similar to that of Devanagari, the line is actually a ruled line used for emphasis, titling or sectioning, and is not broken between individual letters. Some Kaithi fonts, however, were designed with a headstroke, but the line is not broken between individual letters, as would occur in Devanagari.

#Nukta. Kaithi includes a nukta sign, U+110BA KAITHI SIGN NUKTA, a dot which is used as a diacritic below various consonants to form new letters. For example, the nukta is used to distinguish the sound va from ba. The precomposed character U+110AB KAITHI LETTER VA is separately encoded, and has a canonical decomposition into the sequence of U+110A5 KAITHI LETTER BA plus U+110BA KAITHI SIGN NUKTA. Precomposed characters are also encoded for two other Kaithi letters, rha and dddha.

The glyph for U+110A8 KAITHI LETTER YA may appear with or without a nukta. Because the form without the nukta is considered a glyph variant, it is not separately encoded as a character. The representative glyph used in the chart contains the dot. The nukta diacritic also marks letters representing some sounds in Urdu or sounds not native to Hindi. No precomposed characters are encoded in those cases, and such letters must be represented by a base character followed by the nukta.

#Punctuation. A number of Kaithi-specific punctuation marks are encoded. Two marks designate the ends of text sections: U+110BE KAITHI SECTION MARK, which generally indicates the end of a sentence, and U+110BF KAITHI DOUBLE SECTION MARK, which delimits larger blocks of text, such as paragraphs. Both section marks are generally drawn so that their glyphs extend to the edge of the text margins, particularly in manuscripts.

The character U+110BD KAITHI NUMBER SIGN is a format control that interacts with digits. It occurs below a digit or sequence of digits, indicating a numerical reference. The related character U+110CD KAITHI NUMBER SIGN ABOVE occurs above a digit or sequence of digits, and indicates a number in an itemized list, similar to U+2116 NUMERO SIGN. Like U+0600 ARABIC NUMBER SIGN and the other Arabic signs that span numbers (see Section 9.2, Arabic), these Kaithi format controls precede the numbers they graphically interact with, rather than following them. U+110BC KAITHI ENUMERATION SIGN is a standalone, spacing symbol for inline usage.

U+110BB KAITHI ABBREVIATION SIGN, shaped like a small circle, is used in Kaithi to indicate abbreviations. This mark is placed at the point of elision or after a ligature to indicate common words or phrases that are abbreviated, in a similar way to U+0970 DEVANAGARI ABBREVIATION SIGN.

Kaithi makes use of two script-specific dandas: U+110C0 KAITHI DANDA and U+110C1 KAITHI DOUBLE DANDA.

For other punctuation marks occurring in Kaithi texts, available Unicode characters may be used. A cross-shaped character, used to mark phrase boundaries, can be represented by U+002B PLUS SIGN. For hyphenation, users should follow whatever is the recommended practice found in similar Indic script traditions, which might be U+2010 HYPHEN or U+002D HYPHEN-MINUS. For dot-like marks that appear as word-separators, U+2E31 WORD SEPARATOR MIDDLE DOT, or, if the word boundary is more like a dash, U+2010 HYPHEN can be used.

#Digits. The digits in Kaithi are considered to be stylistic variants of those used in Devanagari. Hence the Devanagari digits located at U+0966..U+096F should be employed. To indicate fractions and unit marks, Kaithi uses characters encoded in the Common Indic Number Forms block, U+A830..U+A839.

#15.3 Sharada

#15.3.1 Sharada: U+11180–U+111DF

Sharada is a historical script that was used to write Sanskrit, Kashmiri, and other languages of northern South Asia. It served as the principal inscriptional and literary script of Kashmir from the 8th century CE until the 20th century. In the 19th century, expanded use of the Arabic script to write Kashmiri and the growth of Devanagari contributed to the marginalization of Sharada. Today the script is employed in a limited capacity by Kashmiri pandits for horoscopes and ritual purposes.

#Rendering Behavior. Sharada is a Brahmi-based script, closely related to Devanagari. In general, the rules for Devanagari rendering apply to Sharada as well. For more information, see Section 12.1, Devanagari.

#Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 15-1 shows the Sharada letter that can be analyzed, the single code point that should be used to represent it in text, and the sequence of code points resulting from analysis that should not be used. In contrast, the atomic U+111C4 SHARADA OM is not recommended for use; the om should be written in Sharada with a character sequence, instead.

#Table 15-1. Sharada Vowel Letters and om

For	Use	Do Not Use
𑆎	1118E 𑆎	<1118D 𑆍, 111BC ◌𑆼>
𑆏𑆀	<1118F 𑆏, 11180 ◌𑆀>	111C4 𑇄

#Ruled Lines. While the headstroke is an important structural feature of a character’s glyph in Sharada, there is no rule governing the joining of headstrokes of characters to other characters. The variation was probably due to scribal preference, and should be handled at the font level.

#Virama. The U+111C0 ◌𑇀 SHARADA SIGN VIRAMA is a spacing mark, written to the right of the consonant letter it modifies. Semantically, it is identical to the Devanagari virama and other similar Indic scripts.

#Candrabindu and Avagraha. U+11180 ◌𑆀 SHARADA SIGN CANDRABINDU indicates nasalization of a vowel. It may appear in manuscripts in an inverted form but with no semantic difference. Such glyph variants should be handled in the font. U+111C1 𑇁 SHARADA SIGN AVAGRAHA represents the elision of a word-initial a. Unlike the usual practice in Devanagari in which the avagraha is written at the normal letter height and attaches to the top stroke of the following character, the avagraha in Sharada is written at or below the baseline and does not connect to the neighboring letter.

#Jihvamuliya and Upadhmaniya. The velar and labial allophones of /h/, followed by voiceless velar and labial stops respectively, are written in Sharada with separate signs, U+111C2 𑇂 SHARADA SIGN JIHVAMULIYA and U+111C3 𑇃 SHARADA SIGN UPADHMANIYA. These two signs have the properties of a letter and appear only in stacked conjuncts without the use of virama. Jihvamuliya is used to represent the velar fricative [x] in the context of a following voiceless velar stop:

U+111C2 𑇂 jihvamuliya + U+11191 𑆑 ka → 𑇂𑆑

U+111C2 𑇂 jihvamuliya + U+11192 𑆒 kha → 𑇂𑆒

Upadhmaniya is used to represent the bilabial fricative [ɸ] in the context of a following voiceless labial stop:

U+111C3 𑇃 upadhmaniya + U+111A5 𑆥 pa → 𑇃𑆥

U+111C3 𑇃 upadhmaniya + U+111A6 𑆦 pha → 𑇃𑆦

#Punctuation. U+111C7 𑇇 SHARADA ABBREVIATION SIGN appears after letters or combinations of letters. It marks the sequence as an abbreviation. A word separator, U+111C8 𑇈 SHARADA SEPARATOR, indicates word and other boundaries. Sharada also makes use of two script-specific dandas: U+111C5 𑇅 SHARADA DANDA and U+111C6 𑇆 SHARADA DOUBLE DANDA.

#Digits. Sharada has a distinctive set of digits encoded in the range U+111D0..U+111D9.

#15.4 Takri

#15.4.1 Takri: U+11680–U+116CF

Takri is a script used in northern India and surrounding countries in South Asia, including the areas that comprise present-day Jammu and Kashmir, Himachal Pradesh, Punjab, and Uttarakhand. It is the traditional writing system for the Chambeali and Dogri languages, as well as several “Pahari” languages, such as Jaunsari, Kulvi, and Mandeali. It is related to the Gurmukhi, Landa, and Sharada scripts. Like other Brahmi-derived scripts, Takri is an abugida, with consonants taking an inherent vowel unless accompanied by a vowel marker or the virama (vowel killer).

Takri is descended from Sharada through an intermediate form known as Devāśeṣa, which emerged in the 14th century. Devāśeṣa was a script used for religious and official purposes, while its popular form, known as Takri, was used for commercial and informal purposes. Takri became differentiated from Devāśeṣa during the 16th century. In its various regional manifestations, Takri served as the official script of several princely states of northern and northwestern India from the 17th century until the middle of the 20th century. Until the late 19th century, Takri was used concurrently with Devanagari, but it was gradually replaced by the latter.

Owing to its use as both an official and a popular script, Takri appears in numerous records, from manuscripts to inscriptions to postage stamps. There are efforts to revive the use of Takri for languages such as Dogri, Kishtwari, and Kulvi as a means of preserving access to these language’s literatures.

There is no universal, standard form of Takri. Where Takri was standardized, the reformed script was limited to a particular polity, such as a kingdom or a princely state. The representative glyphs shown in the code charts are taken mainly from the forms used in a variant established as the official script for writing the Chambeali language in the former Chamba State, now in Himachal Pradesh, India. There are a number of other regional varieties of Takri that have varying letterforms, sometimes quite different from the representative forms shown in the code charts. Such regional forms are considered glyphic variants and should be handled at the font level.

#Vowel Letters. Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 15-2 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

#Table 15-2. Takri Vowel Letters

For	Use	Do Not Use
𑚁	11681	<11680, 116AD>
𑚇	11687	<11686, 116B2>
𑚈	11688	<11680, 116B4>
𑚉	11689	<11680, 116B5>

#Consonant Conjuncts. Conjuncts in Takri are infrequent and, when written, consist of two consonants, the second of which is always ya, ra, or ha. Takri ya is written as a subjoining form; Takri ra can be written as a ligature or a subjoining form; and Takri ha is written as a half-form.

#Nukta. A combining nukta character is encoded as U+116B7 TAKRI SIGN NUKTA. Characters that use this sound, mainly loan words and words from other languages, may be represented using the base character plus nukta.

#Headlines. Unlike Devanagari, headlines are not generally used in Takri. However, headlines do appear in the glyph shapes of certain Takri letters. The headline is an intrinsic feature of glyph shapes in some regional varieties such as Dogra Akkhar, where it appears to be inspired by the design of Devanagari characters. There are no fixed rules for the joining of headlines. For example, the headlines of two sequential characters possessing headlines are left unjoined in Chambeali, while the headlines of a letter and a vowel sign are joined in printed Dogra Akkhar.

#Punctuation. Takri uses U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA from Devanagari.

#Fractions. Fraction signs and currency marks found in Takri documents use the characters in the Common Indic Number Forms block (U+A830..U+A83F).

#15.5 Siddham

#15.5.1 Siddham: U+11580–U+115FF

Siddham is a Brahmi-based writing system that originated in India, and is presently used primarily in East Asia. The script is also known as Siddhamātṛkā and Kuṭila. The name Siddhamatrika has broad historic and regional usage throughout India and East Asia. However, modern usage is most strongly associated with the Shingon and Tendai Buddhist traditions in Japan, where the script is also known as Bonji. The representative glyphs in the code charts are based upon Japanese forms of Siddham characters.

The historical record shows the use of Siddham in Central Asia, but the predominant examples are of its use for writing Sanskrit in China, Japan, and Korea, notably for Buddhist manuscripts. Today, it is mainly used for ceremonial and ritualistic purposes associated with esoteric Buddhist practices.

Siddham is most closely related to Sharada, another Brahmi-based script that originated in Kashmir.

#Nukta. The sign U+115C0 𑗀 SIDDHAM SIGN NUKTA is used for transcribing sounds that are not native to the writing system. The nukta sign is not a traditional Siddham character, but it is part of modern Siddham, so that it can accommodate the writing of Japanese and English.

#Vowels. The Siddham vowel signs for u and uu may appear in two forms. The regular forms, called “cloud” forms, are represented by U+115B2 SIDDHAM VOWEL SIGN U and U+115B3 SIDDHAM VOWEL SIGN UU. Alternate vowel sign forms, referred to as “warbler” forms, are represented instead by U+115DC SIDDHAM VOWEL SIGN ALTERNATE U and U+115DD SIDDHAM VOWEL SIGN ALTERNATE UU.

The combination of ra and u should be written with the sequence <U+115A8 𑖨 SIDDHAM LETTER RA, U+115DC 𑗜 SIDDHAM VOWEL SIGN ALTERNATE U> and rendered as 𑖨𑗜. For the combination ra and uu, the form 𑖨𑗝 should be employed, represented by the sequence <U+115A8 SIDDHAM LETTER RA, U+115DD SIDDHAM VOWEL SIGN ALTERNATE UU>.

#Virama and Conjuncts. The virama, U+115BF 𑖿 SIDDHAM SIGN VIRAMA, is identical to the corresponding character in Devanagari and silences the inherent vowel of a consonant. The default rendering of the Siddham virama is as a visible sign.

Consonant clusters in Siddham are written as conjuncts and follow the same model as conjuncts in Devanagari. Conjuncts are represented using the Siddham virama, which is written between each consonant in the cluster. Conjuncts may be written vertically, horizontally, or as independent ligatures. There are traditional Chinese and Japanese tabulations for Siddham conjuncts.

Siddham conjuncts may represent clusters with a large number of consonants. For example, rkṣvrya is a conjunct cluster produced by a sequence of six conjuncts, as shown in Figure 15-1.

#Figure 15-1. Siddham Consonant Cluster

#Head Marks. The mark U+115C1 𑗁 SIDDHAM SIGN SIDDHAM is written at the beginning of a text. Paleographically, the sign corresponds to characters used in other scripts, such as U+0FD3 ࿓ TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA. It represents the Sanskrit word siddham, “accomplished,” and the phrase siddhirastu, “may there be success.” A vertically-oriented glyph variant is used for vertical text layout.

#Repetition Marks. Three marks, U+115C6 𑗆 SIDDHAM REPETITION MARK-1, U+115C7 𑗇 SIDDHAM REPETITION MARK-2, and U+115C8 𑗈 SIDDHAM REPETITION MARK-3 are used to indicate the text repetition. They are written after the text that is to be repeated.

#Section Signs. A set of fourteen section marks are used in Siddham to indicate the ends of sentences, phrases, verses, and sections. They appear in manuscripts and script manuals. According to the Shingon philosophy, the characters possess esoteric qualities that relay information regarding the interpretation of the text.

#Punctuation. There are five other punctuation marks encoded for Siddham, as shown in Table 15-3. Both Siddham danda and Siddham double danda have graphical variants used in informal Japanese writing of Siddham.

#Table 15-3. Siddham Punctuation Characters

Code Point and Name			Purpose
115C2	𑗂	SIDDHAM DANDA	marks the end of sentences and other short text sections
115C3	𑗃	SIDDHAM DOUBLE DANDA	used at the end of paragraphs and larger text blocks
115C4	𑗄	SIDDHAM SEPARATOR DOT	marks boundaries between syllables, words, and phrases; written at the head-height.
115C5	𑗅	SIDDHAM SEPARATOR BAR	marks boundaries between syllables, words, and phrases
115C9	𑗉	SIDDHAM END OF TEXT MARK	indicates the end or completion of a text

#15.6 Mahajani

#15.6.1 Mahajani: U+11150–U+1117F

Mahajani is a Brahmi-based writing system that was commonly used across northern India until the middle of the 20th century. It is a specialized commercial script used for writing accounts and financial records. It was used for recording several languages: Hindi, Marwari, and Punjabi. Mahajani was taught and used as a medium of education in Punjab, Rajasthan, Uttar Pradesh, Bihar, and Madhya Pradesh in schools where students from merchant and trading communities learned the script and other writing skills required for business. The name “Mahajani” refers to bankers and money lenders, who were the primary users of the script. The majority of Mahajani records are account books. Although the Mahajani script is no longer in general use, it is an important key to the historical financial records of northern India.

Mahajani has similarities to Landa, Kaithi, and Devanagari. In structure and orthography, Mahajani resembles scripts of the Landa family used in Punjab and Sindh, which are related to Sharada.

#Structure. Mahajani is written from left to right. It is based upon the Brahmi model, but it is structurally simpler and behaves as an alphabet. Vowel signs are not used, and there is no virama. Consonant clusters are not written in Mahajani using half-forms or ligatures (except for one ligature for shri), or even a visible virama. The elements of a consonant cluster are written sequentially using regular consonant letters.

Vowel signs are not written. Consonant letters theoretically bear the inherent vowel /a/, but the glyph for ka for example represents not only ka, but also any one of the syllables ka, kā, ki, kī, ke, and so on. In cases where greater precision is required, a vowel letter may be written after a consonant to convey the intended vocalic context. In general, the value of a consonant letter must be inferred at the morphological level.

Nasalization is not represented using special signs, such as anusvara. Instead U+11167 MAHAJANI LETTER NA is used in cases where nasalization is explicitly recorded. In several cases, words are written simply with nasalization deleted.

U+11173 MAHAJANI SIGN NUKTA is used for writing sounds that are not represented by a unique character, such as allophonic variants and sounds that occur in local dialects or in loanwords. It has limited use in Mahajani.

Several letters have glyphic variants. Those variants are not separately encoded.

#Digits. Mahajani does not have distinctive script-specific digits. The Devanagari digits located at U+0966..U+096F should be used.

#Other Symbols. Fraction signs and unit marks are found in Mahajani documents, and may be represented using the characters encoded in the “Common Indic Number Forms” block.

#Punctuation. Mahajani employs a dash, middle dot, and colon, which should be represented by the corresponding Latin characters. For the dandas, Mahajani employs U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA. Mahajani also contains two other script-specific punctuation signs, U+11174 MAHAJANI ABBREVIATION SIGN and U+11175 MAHAJANI SECTION MARK. There are no formal rules for punctuation, and word spacing is not generally observed.

#15.7 Khojki

#15.7.1 Khojki: U+11200–U+1124F

Khojki is a writing system used by the Nizari Ismaili community of South Asia for recording religious literature. It was developed in Sindh, now in Pakistan, for representing the Sindhi language. The script spread to surrounding regions and was used for writing Gujarati, Punjabi, and Siraiki, as well as several languages related to Hindi. It was also used for writing Arabic and Persian. Popular Nizari Ismaili tradition states that Khojki was invented and propagated by Pir Sadruddin, an Ismaili missionary.

Khojki is one of two Landa scripts that were developed into formal liturgical scripts for use by religious communities; the other is Gurmukhi, which was developed for writing the sacred literature of the Sikh tradition.

Khojki is also called “Sindhi” and “Khwajah Sindhi.” Khojki was in use by the 16th century CE, as attested by manuscript evidence. The printing of Khojki books flourished after Laljibhai Devraj produced metal types for Khojki in Germany for use at his Khoja Sindhi Printing Press in Mumbai.

While usage of Khojki has declined over the past century, it is used wherever Nizari Ismaili Muslims of South Asian origin reside. The largest communities are found in Pakistan, India, Canada, United States, the United Kingdom, Kenya, Tanzania, and Uganda. Khojki primers continue to be published in Pakistan for teaching the script. Khojki manuscripts and books are used in Ismaili ceremonies not only in South Asia, but in east and south Africa, where large diaspora communities formed by the 19th century. The script was also used by communities related to the Nizari Ismailis, such as the Imamshahis of Gujarat.

#Structure. The general structure of Khojki is similar to that of other Brahmi-derived Indic scripts. It is written from left to right.

Khojki has a smaller repertoire of independent vowel letters than other Brahmi-derived scripts. Conventionally, the letters U+11202 KHOJKI LETTER I and U+11203 KHOJKI LETTER U are used for writing both short and long forms of i and u, respectively. However, some Khojki texts distinguish between the short and long forms of i. Those texts should use U+11202 KHOJKI LETTER I to represent long i and U+11240 KHOJKI LETTER SHORT I to represent short i. The letters U+11205 KHOJKI LETTER AI and U+11207 KHOJKI LETTER AU represent diphthongs. Although they are attested in manuscripts and books, Khojki originally did not have unique letters for these vowels. In early Khojki records, diphthongs are generally represented as digraphs. Several variant forms of vowel letters are also attested.

The repertoire of dependent vowel signs is larger than that of independent vowel letters. There are separate signs for U+1122D KHOJKI VOWEL SIGN I and U+1122E KHOJKI VOWEL SIGN II, but no form for uu. Instead, the single sign U+1122F KHOJKI VOWEL SIGN U is used for both short and long forms. U+11232 KHOJKI VOWEL SIGN O is often written by placing the U+11230 KHOJKI VOWEL SIGN E element above the consonant letter.

Geminate consonants are marked by the U+11237 KHOJKI SIGN SHADDA, written above the consonant letter that is doubled. The positioning may change in relation to vowel signs.

Nasalization is indicated by the sign U+11234 KHOJKI SIGN ANUSVARA. It is written to the right of the letter or sign with which it combines.

U+11235 KHOJKI SIGN VIRAMA is identical in function to corresponding characters in other Indic scripts. It is written to the right of a consonant letter.

U+11236 KHOJKI SIGN NUKTA is used for producing characters to represent sounds not native to Sindhi. The sign may be written with vowel letters, vowel signs, and consonant letters. The nukta is written above a letter.

#Vowels. Khojki vowel letters and vowel signs are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 15-4 shows the letters and signs that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

#Table 15-4. Khojki Vowels

For	Use	Do Not Use
𑈁	11201	<11200, 1122C>
𑈂	11202	<11240, 1122E>
𑈃	11203	<11206, 1122C>
𑈅	11205	<11200, 11231>
𑈇	11207	<11200, 11233> or <11200, 1122C, 11231>
𑈲	11232	<1122C, 11230>
𑈳	11233	<1122C, 11231>

#Punctuation. Khojki separates words using U+1123A KHOJKI WORD SEPARATOR. U+11238 KHOJKI DANDA and U+11239 KHOJKI DOUBLE DANDA are used to mark the end of sentences. The DOUBLE DANDA is also used to mark verse sections. Typically, DOUBLE DANDA is written with U+1123A KHOJKI WORD SEPARATOR to the left and right of verse numbers.

Section marks appear frequently in Khojki manuscripts as punctuation that delimits the end of a section or another larger block of text. The U+1123B KHOJKI SECTION MARK is generally used to mark the end of a sentence, while U+1123C KHOJKI DOUBLE SECTION MARK is used to delimit larger blocks of text, such as paragraphs. Both generally extend to the margin of the text-block.

Latin punctuation marks are also used in printed Khojki.

U+1123D KHOJKI ABBREVIATION SIGN is used for marking abbreviations.

#Digits. Khojki makes use of Gujarati digits U+0AE6 through U+0AEF.

#15.8 Dogra

#15.8.1 Dogra: U+11800–U+1184F

In the 17th century, the Dogra script was used to write the Dogri language in Jammu and Kashmir in the northern region of the Indian subcontinent. Dogri is an Indo-Aryan language now usually written with the Devanagari script. The Dogra script was standardized in the 1860s, and is closely related to the Takri script. The official form, known as “Name Dogra Akkar” or “New Dogra Script,” appears in administrative documents, on currency, postcards, postage stamps, and in literary works. The unofficial, common written form of the script is called “Old Dogra.” The glyphs in the code chart are based on New Dogra.

#Structure. Dogra is an abugida, based on Brahmi. It is written left to right. The script includes a virama, U+11839 DOGRA SIGN VIRAMA, to create conjuncts and to suppress the inherent vowel.

#Vowels. Because the glyphs for Dogra vowel letters changed over time, the phonetic value of three vowel letters varies between New and Old Dogra. Old Dogra uses U+11802 DOGRA LETTER I for u, U+11803 DOGRA LETTER II for i, and U+11804 DOGRA LETTER U for o and au. The shapes of the vowel signs also vary between Old and New Dogra. Distinct fonts can be used to reflect the Old Dogra vowel shapes, as opposed to the New Dogra shapes.

A feature of Dogra is that the dependent vowel may be represented either by the independent vowel letter, or by the dependent vowel sign. For example, the syllable ke may be represented by 𑠊𑠆 <ka, e> or 𑠊𑠳 <ka, vowel sign e>.

#Characters Used to Represent Sanskrit. U+11831 DOGRA VOWEL SIGN VOCALIC R, U+11832 DOGRA VOWEL SIGN VOCALIC RR, and U+11828 DOGRA LETTER SSA are used in New Dogra to represent sounds of Sanskrit origin.

#Consonant Conjuncts. Consonant clusters in Dogra may be rendered in different ways. The most common method is to place a virama beneath each bare consonant. Certain consonant clusters may also be written as conjuncts. A conjunct may be an atomic ligature, such as 𑠊𑠹𑠨 kṣa (represented with <ka, virama, ssa>), or a looser ligature, such as 𑠩𑠹𑠔 sṭa (<sa, virama, tta>), in which the individual shapes of each letter are visible.

In particular, although Dogra does not normally use repha to represent the initial ra in a consonant cluster, a non-initial ra is sometimes conjoined to form a ligature. A conjoined non-initial ra is usually attached below the base letter, in a somewhat reduced form. Depending on the graphical structure of the preceding consonant, the non-initial ra may also appear to be the base of the cluster, with the preceding consonant taking a half-form instead. For example, New Dogra consistently uses the conjunct 𑠧𑠹𑠤 śra (<sha, virama, ra>) in the Sanskrit honorific śrī, which shows a half-form ofśa.

#Other Symbols. U+11837 DOGRA SIGN ANUSVARA indicates nasalization, and U+11838 DOGRA SIGN VISARGA indicates post-vocalic aspiration in words of Sanskrit origin, while U+1183A DOGRA SIGN NUKTA is used to transcribe sounds that are not native to the Dogri language.

#Punctuation. U+1183B DOGRA ABBREVIATION SIGN denotes abbreviations. U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA indicate the ends of sentences and paragraphs.

#Digits and Number Forms. Digits in Dogra vary across written and printed sources: some Old Dogra digits resemble Takri digits, while digits in some New Dogra documents resemble Devanagari. Because of this wide variation, script-specific digits have not been encoded. Devanagari digits should be used to represent digits in Dogra text. For representation of Dogra fraction and currency signs, use characters from the Common Indic Number Forms block.

#15.9 Khudawadi

#15.9.1 Khudawadi: U+112B0–U+112FF

Khudawadi is a script used historically for writing the Sindhi language, which is spoken in India and Pakistan. Official forms of Khudawadi are known as “Hindi Sindhi,” “Hindu Sindhi,” and “Standard Sindhi.” Khudawadi is a Landa-based script and related to Sharada. Like other Landa writing systems, Khudawadi is a mercantile script used for routine writing, accounting, and other commercial purposes and was known as the shopkeeper and merchant script. It is associated with the merchant communities of Hyderabad, Sindh. In addition to mercantile records, Khudawadi was used in education, book printing, and for court records.

In the 1860s, Khudawadi was chosen as the basis for a written standard for education and administration in Sindh and was developed as an official language. Official Khudawadi possesses unique characters for each vowel and consonant sound of the Sindhi language, as well as vowel signs. In the late 19th century, an Arabic-based script became the official writing system for Sindhi in Pakistan and India. Sindhi is also written in the Devanagari script in India. Khudawadi is now obsolete.

#Structure. The general structure of Khudawadi is similar to that of other Brahmi-based Indic scripts. It is written from left to right.

#Vowel Letters. Some independent vowel letters may be represented using a combination of a base vowel letter and a dependent vowel sign. This practice is not recommended. The atomic character for the independent vowel letter should always be used.

#Table 15-5. Khudawadi Vowel Letters

For	Use	Do Not Use
𑊱	112B1	112B0 + 112E0
𑊶	112B6	112B0 + 112E5
𑊷	112B7	112B0 + 112E6
𑊸	112B8	112B0 + 112E7
𑊹	112B9	112B0 + 112E8

#Consonant Conjuncts. Consonant clusters generally consist of two consonants. These are written using a visible virama. The encoded representation is <C1 + virama + C2>. Half-forms and ligated conjunct forms are not attested.

#Nasalization. U+112DF 𑋟 KHUDAWADI SIGN ANUSVARA is used for indicating nasalization.

#Nukta. U+112E9 𑋩 KHUDAWADI SIGN NUKTA is used for representing sounds not native to Sindhi, such as those that may occur in Persian and Arabic loanwords. Attested Khudawadi letters with nukta are shown in Table 15-6, along with the Arabic letters for which they substitute. JA + NUKTA, pronounced za, corresponds to a number of distinct Arabic letters.

#Table 15-6. Representation of Arabic Sounds in Khudawadi

Sound	Khudawadi		Arabic
kha	𑊻𑋩	KHA + NUKTA	U+062E ARABIC LETTER KHAH
ġa	𑊼𑋩	GA + NUKTA	U+063A ARABIC LETTER GHAIN
za	𑋂𑋩	JA + NUKTA	U+0630 ARABIC LETTER THAL U+0632 ARABIC LETTER ZAIN U+0636 ARABIC LETTER DAD U+0638 ARABIC LETTER ZAH
fa	𑋓𑋩	PHA + NUKTA	U+0641 ARABIC LETTER FEH

In principle, the nukta may be written with any Khudawadi vowel or consonant letter. If other combining marks, such as a dependent vowel sign or anusvara, also occur in a combining sequence applied to that base character, then the convention is to represent the nukta first in the combining sequence.

#Punctuation. The Khudawadi uses dandas and European punctuation, such as periods, dashes, colons, and semi-colons. Khudawadi dandas are unified with those of Devanagari. Line breaking for Khudawadi characters follows the rules for Devanagari.

#Digits. Khudawadi has a full set of decimal digits. Fraction signs and currency marks are attested in Khudawadi records. These may be represented using characters in the Common Indic Number Forms block found at U+A830..U+A83F.

#15.10 Multani

#15.10.1 Multani: U+11280–U+112AF

The Multani script was used to write the Seraiki language, an Indo-Aryan language spoken in the Punjab in eastern Pakistan and the northern Sindh area of southeastern Pakistan. Multani is a Landa-based script, related to Gurmukhi, and distantly related to Khudawadi and Khojki. The script, also known as Karikki or Sarai, was used for routine writing and commercial activities. The first book in the Multani script was published in 1819. By the latter half of the 19th century, the British administration introduced the Arabic script as the standard for writing the languages of the Sindh, which led to the demise of various non-Arabic scripts, including Multani. The script continued to be used into the 20th century. Today Seraiki is written in the Arabic script.

There is no standard form of the Multani script. The representative glyphs shown in the code charts are based on printed forms from an 1819 version of the New Testament, with additional characters that are found only in handwritten documents. Such variant forms are considered glyphic variants and should be handled at the font level.

The script underwent orthographic changes in the first quarter of the 20th century, with a reduction in the character repertoire. The repertoire encoded in this block is based on the set of all characters that are distinctly attested.

#Structure. Although Multani is based on the Brahmi model, it is closer in structure to an abjad than an abugida. There are four independent vowel letters, a, i, u and e, and no dependent vowel signs. Consonants theoretically possess the inherent /a/ vowel, but as vowels are not marked, the actual syllabic vowel of a consonant in running text is ambiguous and must be inferred from context. Consonant clusters are written using independent letters, rather than with conjuncts. There is no virama. Vowels are generally not written unless they occur in isolation, in word initial position, or in the final position of monosyllabic words.

The letter 𑊀 a is used to represent /a/, /a:/ and in some sources /e/ and /æ/. The letter 𑊁 i represents /i/ and /i:/ and commonly the semivowel /j/. The letter 𑊂 u represents /u/, /u:/ and /o/. The letter 𑊃 e represents /e/, and in some sources /æ/ and /o/.

#Digits. The Gurmukhi digits U+0A66..U+0A6F should be employed to represent digits in Multani.

#Punctuation. Multani has only one script-specific punctuation mark, U+112A9 MULTANI SECTION MARK, which indicates the end of a sentence.

#15.11 Tirhuta

#15.11.1 Tirhuta: U+11480–U+114DF

Tirhuta was the traditional writing system for the Maithili language, which is spoken by more than 35 million people in the state of Bihar in India, and in the Koshi and Madhesh provinces of Nepal. Maithili is an official regional language of India and the second most spoken language in Nepal. Tirhuta is a Brahmi-based script derived from Gauḍī, or “Proto-Bengali,” which evolved from the Kuṭila branch of Brahmi by the 10th century. It is related to the Bengali, Newari, and Oriya scripts, which are also descended from Gauḍī, and became differentiated from them by the 14th century.

Tirhuta remained the primary writing system for Maithili until the late 20th century, when it was replaced by Devanagari. The Tirhuta script forms the basis of scholarly and religious scribal traditions that have been associated with the Maithili and Sanskrit languages since the 14th century. Tirhuta continues to be used for writing manuscripts of religious and literary texts, as well as personal correspondence. Since the 1950s, various literary societies, such as the Maithili Akademi and Chetna Samiti, have been publishing literary, educational, and linguistic materials in Tirhuta. The script is also used in signage in Darbhanga and other districts of north Bihar, and as an optional script for writing the civil services examination in Bihar.

Although several Tirhuta characters, ligatures or combined shapes bear resemblance to those of Bengali, these similarities are superficial.

#Structure. The general structure (phonetic order, matra reordering, use of virama, and so on) of Tirhuta is similar to that of other Brahmi-based Indic scripts. The script is written from left to right.

#Vowels. Tirhuta uses independent vowel letters and corresponding combining vowel signs. The signs U+114BA TIRHUTA VOWEL SIGN SHORT E and U+114BD TIRHUTA VOWEL SIGN SHORT O do not have corresponding independent forms, because the sounds they represent do not occur in word initial position.

Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 15-7 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

#Table 15-7. Tirhuta Vowel Letters

For	Use	Do Not Use
𑒂	11482	<11481, 114B0>
𑒉	11489	<114AA, 114B5>
𑒊	1148A	<114AA, 114B6>
𑒌	1148C	<1148B, 114BA>
𑒎	1148E	<1148D, 114BA>

#Consonants. Some of the 33 consonants look like Bengali consonants, but represent different sounds. For example, U+114A9 TIRHUTA LETTER RA has the same form as U+09AC BENGALI LETTER BA, and U+114AB TIRHUTA LETTER VA has the same shape as U+09B0 BENGALI LETTER RA.

Consonants combined with vowel signs, combined in conjuncts, or appearing at the end of a word commonly use context-dependent ligatures or glyph combinations. These shapes also contrast with usage in Bengali. For example, the consonant-vowel combination <U+1149E TIRHUTA LETTER TA, U+114B3 TIRHUTA VOWEL SIGN U> in Tirhuta produces the same shape as the conjunct <U+09A4 BENGALI LETTER TA, U+09CD BENGALI SIGN VIRAMA, U+09A4 BENGALI LETTER TA> in the Bengali script.

All variant forms for letters, character elements and conjuncts in Tirhuta should be managed at the font level.

#Virama. U+114C2 TIRHUTA SIGN VIRAMA is identical in function to the corresponding character in other Indic scripts.

#Nasalization. Nasalization is indicated by U+114BF TIRHUTA SIGN CANDRABINDU and U+114C0 TIRHUTA SIGN ANUSVARA. These signs are written centered above the base. If written with an above-base sign or a letter with a graphical element that extends past the headstroke, they are placed to the right of such signs and elements.

#Characters for Representing Sanskrit. Two characters are attested in Vedic and classical Sanskrit manuscripts written in Tirhuta. U+114C1 TIRHUTA SIGN VISARGA represents an allophone of ra or sa at word-final position in Sanskrit orthography. U+114C5 TIRHUTA GVANG represents nasalization. It belongs to the same class of characters as U+1CE9 VEDIC SIGN ANUSVARA ANTARGOMUKHA, U+1CEA VEDIC SIGN ANUSVARA BAHIRGOMUKHA, and so on.

Tirhuta also uses U+1CF2 VEDIC SIGN ARDHAVISARGA which can be found in the Vedic Extensions block.

#Nukta. U+114C3 TIRHUTA SIGN NUKTA is used for writing sounds that are not represented by a unique character, such as allophonic variants and sounds that occur in local dialects or in loanwords. The nukta may be written with any vowel or consonant letter. If other combining marks, such as a vowel sign or anusvara, also appear with the base character, then the nukta is written first.

U+114A5 TIRHUTA LETTER BA and U+114AB TIRHUTA LETTER VA have shapes that include a dot, but this is not semantically equivalent to a nukta. These letters do not decompose to nukta, and are treated as atomic characters.

#Punctuation. Tirhuta uses U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA from the Devanagari block.

#Special Signs. U+114C6 TIRHUTA ABBREVIATION SIGN denotes abbreviations. There are also two special script-specific signs in Tirhuta. The first, U+11480 TIRHUTA ANJI, is used in the invocations of letters, manuscripts, books, and charts of the script. The sign anji is said to represent the tusk of the deity Ganesa, patron of learning. The second, U+114C7 TIRHUTA OM, contrasts with the Bengali sign for om, the latter being a simple combination of U+0993 BENGALI LETTER O plus U+0981 BENGALI SIGN CANDRABINDU.

#Digits. Tirhuta has a full set of decimal digits.

#Fractions. Number forms and unit marks are also found in Tirhuta documents. The most common of these are signs for writing fractions and currency, and they are represented using characters in the Common Indic Number Forms block (U+A830..U+A83F). They include U+A831 NORTH INDIC FRACTION ONE HALF, U+A832 NORTH INDIC FRACTION THREE QUARTERS, and so on, as well as U+A838 NORTH INDIC RUPEE MARK. Tirhuta also uses Bengali “currency numerators,” such as U+09F4 BENGALI CURRENCY NUMERATOR ONE.

#15.12 Modi

#15.12.1 Modi: U+11600–U+1165F

Modi is a Brahmi-based script used mainly for writing Marathi. Modi was also used to write other regional languages such as Hindi, Gujarati, Kannada, Konkani, Persian, Tamil, and Telugu. According to an old legend, the Modi script was brought to India from Sri Lanka by Hemadri Pandit, known also as Hemadpant, who was the chief minister of Ramacandra, the last king of the Yadava dynasty, who reigned from 1271 to about 1309. Another tradition credits the creation of the script to Balaji Avaji, secretary of state to the late 17th-century Maratha king Shivaji Raje Bhonsle, also known as Chhatrapati Shivaji Maharaj. While the veracity of such accounts is difficult to ascertain, it is clear that Modi derives from the Nagari family of scripts and is a modification of the Nagari model intended for continuous writing.

Modi emerged as an administrative writing system in the 16th century before the rise of the Maratha dynasties. It was adopted by the Marathas as an official script beginning in the 17th century and was used in such a capacity in Maharashtra until the middle of the 20th century. In the 1950s the use of Modi was formally discontinued and the Devanagari script, known as “Balbodh,” was promoted as the standard writing system for Marathi.

There are thousands of Modi documents preserved in South Asia and Europe. The majority of these are in various archives in Maharashtra, while smaller collections are kept in Denmark and other countries, because of European presence in Tanjore, Pondicherry, and other regions in South Asia through the 19th century. The earliest extant Modi document dates from the early 17th century. While the majority of Modi documents are official letters, land records, and other administrative documents, the script was also used in education, journalism, and other routine activities before the 1950s. Printing in Modi began in the early 19th century after Charles Wilkins cut the first metal fonts for the script in Calcutta. Newspapers were published in Modi; primers were produced to teach the script in schools, and various personal papers and diaries were kept in the script.

#Structure. Modi is a Brahmi-based script related to Devanagari. It is written from left to right. In general, the rules for Devanagari rendering also apply to Modi (see Section 12.1, Devanagari). However, one characteristic feature of Modi is a large number of context-dependent forms of consonants and vowel-signs. Shaping and glyph substitutions for these contextual forms are managed in the font.

#Vowel Letters. Generally, the distinction between regular and long forms of i and u is not preserved in Modi. The letter U+11603 MODI LETTER II may represent both i and ī, and U+11604 MODI LETTER U may be used for writing both u and ū. The same can be said of the corresponding dependent vowel signs. Both regular and long forms appear in the Modi block, because they are attested in documentation about Modi.

The vocalic letters in the range U+11635..U+11638 are included in the encoding, but are not in modern use, as is the case in other Indic scripts. Modi vocalic r may alternatively be written as the sequence <U+11628 MODI LETTER RA, U+11632 MODI VOWEL SIGN II> rī.

Vowel letters are encoded atomically in Unicode, even if they can be analyzed visually as consisting of multiple parts. Table 15-8 shows the letters that can be analyzed, the single code point that should be used to represent them in text, and the sequence of code points resulting from analysis that should not be used.

#Table 15-8. Modi Vowel Letters

For	Use	Do Not Use
𑘊	1160A	<11600, 11639>
𑘋	1160B	<11600, 1163A>
𑘌	1160C	<11601, 11639>
𑘍	1160D	<11601, 1163A>

#Rendering. Many of the consonant-vowel and consonant-consonant combinations in Modi involve special contextual forms of the consonant or vowel-sign or both. These are rendered by means of contextual rules in the font, using specially shaped and positioned glyph pieces or preformed ligatures.

#Consonant Clusters Involving ra. A number of contextual forms are used for U+11628 𑘨 MODI LETTER RA. Some of these are similar to the use of ra in Devanagari. As the first consonant in a cluster it is generally rendered as a repha; however, Modi also uses the eyelash ra in place of repha in certain native Marathi contexts. As in Devanagari, the eyelash ra is produced using the sequence <U+11628 𑘨 MODI LETTER RA, U+1163F 𑘿 MODI SIGN VIRAMA, U+200D ‍ ZERO WIDTH JOINER>.

Non-initial ra in conjuncts is typically rendered using one of two subjoined forms; however, some conjuncts with ra are represented as distinct ligatures. The most common of these is the conjunct 𑘝𑘿𑘨 represented by the sequence <U+1161D 𑘝 MODI LETTER TA, U+1163F 𑘿 MODI SIGN VIRAMA, U+11628 𑘨 MODI LETTER RA>. Sequences of ra following some other consonants, such as <ka, ra>, <ka, -aa, ra>, or <sa, ra> are also displayed by distinct ligatures, as shown in Figure 15-2. The sequence of initial ra followed by the rounded consonants kha, dha, or ha, may also appear with distinct ligatures.

#Figure 15-2. Modi Shaping for ra

Unusually, the shape of ra is also influenced at the word level, depending upon the characters in the preceding syllable. See the last example in Figure 15-2. This influence on the shape of ra may even occur preceding punctuation; in certain environments, ra following a danda or double danda is written using a special contextual form. For example:

U+11642 𑙂 double danda + U+11628 𑘨ra → 𑙂𑘨

To produce this behavior, the danda and double danda characters in the Modi block should be used instead of the ones in the Devanagari block.

#Punctuation and Word Boundaries. Traditionally, word boundaries are not marked in Modi because it is an administrative script, characterized by the practice of rapid writing without lifting the pen. Paragraph and other section boundaries are, however, indicated in some Modi documents through the use of whitespace. Modern practice uses spaces and various punctuation conventions, including danda and Western punctuation marks. Some printed books use a period instead of a danda to indicate a sentence boundary.

#Various Signs. Nasalization is indicated by U+1163D MODI SIGN ANUSVARA, and abbreviations are indicated using U+11643 MODI ABBREVIATION SIGN. U+1163E MODI SIGN VISARGA represents an allophone of ra or sa at word-final position in Sanskrit orthography. U+11640 MODI SIGN ARDHACANDRA is used for transcribing sounds used in English names and loanwords.

U+11644 MODI SIGN HUVA is written as an invocation in several Modi documents. It is derived from the Arabic huwa.

Currency values are written using U+A838 NORTH INDIC RUPEE MARK.

#Numbers. Modi has a full set of decimal digits. Several number forms and unit marks are used for writing Modi and are represented using characters in the Common Indic Number Forms block. They include the base-16 fraction signs U+A830..U+A835. The absence of intermediate units is indicated by U+A837 NORTH INDIC PLACEHOLDER MARK, which is called ali in Marathi. U+A836 NORTH INDIC QUARTER MARK is used for representing anna values.

#15.13 Nandinagari

#15.13.1 Nandinagari: U+119A0–U+119FF

Nandinagari is a Brahmi-based script that was used in southern India between the 11th and 19th centuries for manuscripts and inscriptions in Sanskrit in south Maharashtra, Karnataka and Andhra Pradesh. It is related to Devanagari, and was the official script of the Vijayanagara kingdom of southern India (1336–1646 CE). There are numerous manuscripts and inscriptions containing Nandinagari text. This script was also used for writing Kannada in Karnataka.

#Structure. With minor historical exceptions, Nandinagari is an abugida written from left to right where there is a consonant plus an inherent vowel (usually the sound /a/), similar to Devanagari. The absence of the inherent vowel is frequently marked with a virama. The virama sign that suppresses the inherent vowel of the consonant is a combining character.

#Headstrokes. These are an inherent feature of Nandinagari letters, but their behavior differs from headstrokes in modern Devanagari. Headstroke connections in Nandinagari generally are restricted to an aksara (orthographic syllable) and do not extend to neighboring syllables. The headstroke connects vowel or consonant letters and spacing dependent vowels of an aksara, while spaces separate individual aksaras.

#Vowels. There are 12 vowel letters in the range U+119A0..U+119AD and 11 dependent vowel signs in the range U+119D1..U+119DD. U+119D2 NANDINAGARI VOWEL SIGN I is positioned at the top-left edge of letters that have headstrokes. For other letters U+119D2 hangs above the top-left portion of the body. However, the style of writing the sign varies considerably, particularly in handwriting.

#Consonants. There are 35 consonant letters. U+119D0 NANDINAGARI LETTER RRA appears to have been introduced in the 11th century for transcribing the Kannada letter RRA, and is not part of the traditional repertoire of Nandinagari.

#Virama. U+119E0 NANDINAGARI SIGN VIRAMA has two functions, similar to the corresponding Devanagari character. Used as a halanta, it marks the absence of the inherent vowel of a consonant letter. U+119E0 is also a format character used to produce conjuncts.

#Vowel Modifiers. U+119DE NANDINAGARI SIGN ANUSVARA indicates nasalization. It is placed to the right of a base letter or right-side vowel sign. U+119DF NANDINAGARI SIGN VISARGA represents post-vocalic aspiration in words of Sanskrit origin.

#Other Signs. U+119E1 NANDINAGARI SIGN AVAGRAHA marks the elision of word-initial a in Sanskrit as a result of sandhi. The auspicious sign U+119E2 NANDINAGARI SIGN SIDDHAM indicates an invocation at the beginning of documents.

#Punctuation. U+119E3 NANDINAGARI HEADSTROKE is used as a sign of spacing or joining a word. It may connect a word that is broken on account of imperfections on a writing surface. U+119E3 can also serve as a gap filler. Nandinagari uses the danda and double danda marks encoded in the Devanagari block.

#Digits. The Nandinagari digits are glyph variants of the Kannada digits U+0CE6..U+0CEF. No script specific digits are encoded for Nandinagari.

#15.14 Grantha

#15.14.1 Grantha: U+11300–U+1137F

The Grantha script descends from Brahmi. The modern form is chiefly used to write the Sanskrit language, including Vedic Sanskrit. It is used primarily in Tamil Nadu, and to a lesser extent in Sri Lanka and other parts of South India.

The Grantha script is frequently mixed with the Tamil script to write Sanskrit words. Grantha has also been used to write the Sanskrit words of Tamil Manipravalam—a mixed Sanskrit-Tamil language—though this usage has become rare. In addition, Grantha characters may occasionally be employed with the Tamil script in the writing systems of minority languages of southern India.

Historically, intermediate forms which gave rise to the Grantha script are attested as of the fourth century CE. The earliest examples are found in inscriptions of the early Pallava kings who ruled over parts of what is currently northern Tamil Nadu and southern Andhra Pradesh. Modern Grantha, which this encoding represents, belongs to the period after the thirteenth century CE.

Modern Grantha is frequently used by Tamil speakers to represent Sanskrit because Grantha’s large set of letters can represent all the sounds of Sanskrit without the use of diacritical marks. The Tamil script has a smaller repertoire of letters that requires diacritical marks to represent Sanskrit directly. This use of diacritical marks often leads to confusion regarding the pronunciation of Sanskrit when written in the Tamil script.

#15.14.2 Rendering Grantha

Although the Grantha script is visually similar to Tamil, its structure is similar to other Indic scripts that are used to write Sanskrit. Written Sanskrit requires support for stacked consonant structures.

#Consonant Clusters. Some consonant clusters are stacks, some consonant structures are a combination of ligatures and stacks, and some are just ligatures. Ligatures are often used instead of stacks, and consonant clusters are frequently written as a combination of ligatures and stacking.

The typical stack height found in print in non-Vedic Sanskrit is two elements, but it is three in Vedic Sanskrit. Stacks, like ligatures, are equivalent to single consonants for the purpose of application of vowel signs.

Instances requiring more than three elements in a stack require special handling. In these cases, the initial elements are pushed out of the consonant stack and may form their own stacks. Such special cases are illustrated in Figure 15-3. In this situation, a single phonological consonant cluster followed by a vowel may be represented by more than one orthographic cluster.

#Figure 15-3. Splitting Large Conjunct Stacks in Grantha

two elements	→	two-level stack
three elements	→	three-level stack
four elements	→	vowelless element + three-level stack
five elements	→	vowelless two-level stack + three-level stack
six elements	→	vowelless three-level stack + three-level stack

#Virama. Grantha follows the same virama model as Telugu and Kannada, in which the sequence consonant + virama should be rendered as the vowelless form of the consonant in the desired orthographic style. For example, in the prevalent orthographic style used in modern printing, ta, na, and ma consistently fuse with the virama; ra and la superficially connect with it, and the virama stands apart for all other consonants, as shown in Table 15-9.

#Table 15-9. Rendering of Explicit Virama Forms in Grantha

Fused
ta + virama	𑌤	+	◌𑍍	→	𑌤𑍍
na + virama	𑌨	+	◌𑍍	→	𑌨𑍍
ma + virama	𑌮	+	◌𑍍	→	𑌮𑍍
Connected
ra + virama	𑌰	+	◌𑍍	→	𑌰𑍍
la + virama	𑌲	+	◌𑍍	→	𑌲𑍍
Unconnected
ka + virama	𑌕	+	◌𑍍	→	𑌕𑍍
tta + virama	𑌟	+	◌𑍍	→	𑌟𑍍

These visual distinctions in the rendering of explicit viramas also apply to the various ligated conjuncts of Grantha.

#Vowels. There are two forms of the au vowel sign: U+11357 GRANTHA AU LENGTH MARK is the modern one-part form, while the two-part form U+1134C GRANTHA VOWEL SIGN AU, is somewhat archaic, but is found in manuscripts.

Only two vowel signs touch their base consonant in printed Grantha: U+1133F GRANTHA VOWEL SIGN I and U+11340 GRANTHA VOWEL SIGN II. U+11347 GRANTHA VOWEL SIGN EE and U+11348 GRANTHA VOWEL SIGN AI are rendered to the left of their base. U+1134B GRANTHA VOWEL SIGN OO and the archaic U+1134C GRANTHA VOWEL SIGN AU are two-part vowels with one part placed to the left of the base and one part to the right. All other vowel signs are placed to the right of the base.

Manuscripts written in Grantha will show archaic ligatures of consonants with vowel signs. The vowel signs U+11362 GRANTHA VOWEL SIGN VOCALIC L and U+11363 GRANTHA VOWEL SIGN VOCALIC LL are sometimes placed below and sometimes placed to the right of the base consonant. In contemporary printing practice, vowel signs are placed to the right.

#Signs. Grantha uses the pluta sign to denote vowel lengthening. The pluta is not in current use, but it is found in Vedic manuscripts. The nukta is not used to write Sanskrit, but is used to transcribe words from other languages, such as Irula.

#Cantillation Marks. Grantha uses a number of cantillation marks to represent tone, stress, and breathing in Vedic texts. These marks include the twelve marks encoded in the Grantha block in the range from U+11366..U+11374, and many encoded in other blocks as well, including those listed in Table 15-10.

#Table 15-10. Additional Svara Marks used in Grantha

Generic Vedic Accents
0951 DEVANAGARI STRESS SIGN UDATTA
0952 DEVANAGARI STRESS SIGN ANUDATTA
Samavedic Marks
1CD0 VEDIC TONE KARSHANA
1CD2 VEDIC TONE PRENKHA
1CD3 VEDIC SIGN NIHSHVASA
20F0 COMBINING ASTERISK ABOVE
Additional Marks
1CF2 VEDIC SIGN ARDHAVISARGA
1CF3 VEDIC SIGN ROTATED ARDHAVISARGA
1CF4 VEDIC TONE CANDRA ABOVE
1CF8 VEDIC TONE RING ABOVE
1CF9 VEDIC TONE DOUBLE RING ABOVE

These nonspacing marks are normally applied to independent vowels, to consonants with an inherent vowel, and to consonants with vowel signs. Sometimes they are also applied to dead consonants which are displayed with a visible virama.

The preferred placement of svara marks in Grantha is horizontally centered relative to the syllable. These marks should not extend beyond the horizontal span of the base syllable. The svara marks can be applied to either syllables or digits, and used in combination with each other.

#Punctuation. Danda and double danda marks used with Grantha are found in the Devanagari block; see Section 12.1, Devanagari.

#Line Breaking. Line breaks may occur after every orthographic syllable. Hyphens are not used.

#Numbers. Grantha makes use of the Tamil digits U+0BE6 through U+0BEF, as well as the Tamil historical numerals for ten, one hundred, and one thousand at U+0BF0..U+0BF2. Grantha also uses some numbers and symbols from the Tamil Supplement block in the range U+11FC0..U+11FFF, that contains a set of historic fractions and other symbols.

#15.15 Dives Akuru

#15.15.1 Dives Akuru: U+11900–U+1195F

Dives Akuru or Divehi Akuru was a script used to write the Dhivehi language on the Maldives from the 9th to the 20th centuries. Dives Akuru literally means “islanders’ letters.” The script is most closely related to a medieval form of the Sinhala script. In the 18th century, the Thaana script appeared alongside Dives Akuru. By the turn of the 19th century, Thaana had replaced Dives Akuru as the regular script for Dhivehi. However, individuals and scholars continued to study and use Dives Akuru into the 20th century.

Today, the script style from the 12th to the 14th centuries is termed evēla akuru, while the script style after the 14th century is called dives akuru. Both styles are unified in the Dives Akuru repertoire. Because no traditional documentation of the letter inventory exists for the script, the repertoire is based on texts found on copper plates, paper, and wooden boards, with the broadest repertoire found in the evēla akuru documents. The different styles and specific variants of characters should be handled through fonts.

#Structure. Dives Akuru is an abugida, written left to right. Like other Brahmi-derived scripts, each consonant letter contains an inherent vowel a. To indicate the bare consonant, U+1193D DIVES AKURU SIGN HALANTA is used. Consonant clusters are typically rendered by conjuncts.

#Vowels. Independent vowels are represented either by the distinctive vowel letters (U+11900..U+11909) or by an orthographic syllable composed of U+11925 DIVES AKURU LETTER YA, which acts as a vowel carrier, and the dependent vowel sign.

#Conjuncts. In general, consonant clusters are rendered as conjuncts in Dives Akuru. Most conjuncts consist of clusters of two consonants, but conjuncts with up to three consonants are attested. Four conjoining forms are encoded atomically: two are cluster-initial (U+11941 DIVES AKURU INITIAL RA and U+1193F DIVES AKURU PREFIXED NASAL SIGN) and two are syllable medial (U+11940 DIVES AKURU MEDIAL YA and U+11942 DIVES AKURU MEDIAL RA).

The conjunct structure visually consists of letters that are joined in a distinctive ligature or as a touching ligature. A touching ligature is produced when writing letters together without spaces, so they touch at adjacent edges.

The script uses U+1193E DIVES AKURU VIRAMA to create conjuncts, but no virama is required when using the four atomically-encoded conjoining form characters. Vowel letters may participate in clustering, especially when the second member of the cluster appears in a touching ligature, right after the word boundary. Vowel signs are encoded after the conjunct.

#Halanta. In Dives Akuru, U+1193D DIVES AKURU SIGN HALANTA has multiple functions. As a vowel-killer, the halanta generally attaches to the right-hand side of a letter, and forms a ligature with its base. While the halanta typically suppresses a consonant’s inherent vowel, in some cases the sequence <consonant, halanta> is pronounced as a syllable with /u/. In addition, the consonants ka, na, tta, and ta with an attached halanta may, in certain cases, be rendered as superscripts.

#Nasalization Signs. Post-vocalic nasalization is indicated using U+1193C DIVES AKURU SIGN CANDRABINDU and U+1193B DIVES AKURU SIGN ANUSVARA.

#Nukta. U+11943 DIVES AKURU SIGN NUKTA is used to transcribe sounds that are not native to Dhivehi. The nukta is written below the letter which most closely approximates the foreign sound.

#Digits. Script-specific digits are used for Dives Akuru. They are encoded in the range U+11950..U+11959.

#Punctuation. Three marks of punctuation are encoded for representing Dives Akuru text. A script-specific double danda is encoded at U+11944. Another punctuation mark, U+11945 DIVES AKURU GAP FILLER, is used to fill space at the ends of lines or to signify the end of a document. U+11946 DIVES AKURU END OF TEXT MARK appears at the end of a document, and is often accompanied by U+11945 DIVES AKURU GAP FILLER.

#Line Breaking. Line breaks may occur after any orthographic syllable. Hyphens are not used. Fillers may be used to fill space at the ends of lines, as described in the description of punctuation.

#15.16 Ahom

#15.16.1 Ahom: U+11700–U+1174F

The Ahom script is used in northeast India, primarily to write the Tai Ahom language. The oldest surviving Ahom text is the “Snake Pillar” inscription which was inscribed in the time of King Siuw Hum Miung (1497-1539). The script also appears on other stone inscriptions, coins, brass plates and a large corpus of manuscripts. Although the use of the Tai Ahom language declined in the late 17th century, traditional priests used the language and the Ahom script in their religious practices throughout the 19th century.

Modern use of the Ahom script is considered to have begun in 1920 with the publication of an Ahom-Assamese-English dictionary. This was followed by publication of other dictionaries, word lists, and primers. The publication of Ahom texts has progressed more rapidly in recent decades, thanks to the availability of computers. Today there are large numbers of books published in Assam that contain some Ahom content.

#Structure. Like most other Brahmi-derived scripts, Ahom is an abugida, for which consonant letters are associated with an inherent vowel “a”. The encoding also includes three medial consonants, in the range U+1171D..U+1171F, which follow and graphically attach to an initial consonant letter. In addition, Ahom has a visible virama that functions as a vowel killer, U+1172B AHOM SIGN KILLER. The use of the killer is only obligatory in modern Ahom.

#Vowels. Ahom has no independent vowels, but instead uses U+11712 AHOM LETTER A followed by the corresponding dependent vowel sign (or signs).

#Syllabic Structure. Ahom has closed syllables, and optional medials may occur after initial consonants. Vowels can occur in sequences of U+11712 AHOM LETTER A and dependent vowel signs, or a series of dependent vowel signs. Final consonants take U+1172B AHOM SIGN KILLER.

#Numerals. The original Ahom numeral system was not a decimal radix system; however, in modern use a digit zero has been added, and the digits can be used to express decimal radix numerals. In traditional use, the digits may also be mixed with word spellings when writing out numbers.

The forms of the Ahom digits are derived from several sources. U+11732 AHOM DIGIT TWO is visually identical to U+11701 AHOM LETTER KHA and probably derives from it. The digits 3, 4, and 5 are usually expressed by the Ahom words for those numbers spelled out. U+1173B AHOM NUMBER TWENTY is also just the Ahom word for 20 spelled out.

#Punctuation. Ahom uses two punctuation characters which function similarly to dandas: U+1173C AHOM SIGN SMALL SECTION and U+1173D AHOM SIGN SECTION. The script also uses a paragraph mark, U+1173E AHOM SIGN RULAI, and a symbol that indicates an exclamation, U+1173F AHOM SYMBOL VI.

Modern Ahom uses spaces to indicate word boundaries. This convention is seen in some early Ahom manuscripts, but is not consistent in the early material.

#Variant Forms. A number of variant letterforms are found in manuscripts, but are no longer used in modern Ahom. Specific characters are encoded to represent the historic variants of ta, ga, ba, and the medial ligating ra.

#15.17 Sora Sompeng

#15.17.1 Sora Sompeng: U+110D0–U+110FF

The Sora Sompeng script is used to write the Sora language. Sora is a member of the Munda family of languages, which, together with the Mon-Khmer languages, makes up Austro-Asiatic.

The Sora people live between the Oriya- and Telugu-speaking populations in what is now the Odisha-Andhra border area.

Sora Sompeng was devised in 1936 by Mangei Gomango, who was inspired by the vision he had of the 24 letters. The script was promulgated as part of a comprehensive cultural program, and was offered as an improvement over IPA-based scripts used by linguists and missionaries, and the Telugu and Oriya scripts used by Hindus. Sora Sompeng is used in religious contexts, and is published in a variety of printed materials.

#Encoding Structure. The Sora Sompeng script is an abugida. The consonant letters contain an inherent vowel. There are no conjunct characters for consonant clusters, and there is no visible vowel killer to show the deletion of the inherent vowel. The reader must determine the presence or absence of the inherent schwa based on recognition of each word. The character repertoire does not match the phonemic repertoire of Sora very well.

U+110E4 SORA SOMPENG LETTER IH is used for both [i] and [ɨ], and U+110E6 SORA SOMPENG LETTER OH is used for both [o] and [ɔ], for instance. The glottal stop is written with U+110DE SORA SOMPENG LETTER HAH, and the sequence of U+110DD SORA SOMPENG LETTER RAH and U+110D4 SORA SOMPENG LETTER DAH is used to write retroflex [ɽ]. There is also an additional “auxiliary” U+110E8 SORA SOMPENG LETTER MAE used to transcribe foreign sounds.

#Character Names. Consonant letter names for Sora Sompeng are derived by adding [aʔa] (written ah) to the consonant.

#Punctuation. Sora Sompeng uses Western-style punctuation.

#Line Breaking. Letters and digits behave as in Latin and other alphabetic scripts.

#15.18 Tulu-Tigalari

#15.18.1 Tulu-Tigalari: U+11380–U+113FF

Tulu-Tigalari was used to primarily write Sanskrit religious texts, but a small number of Tulu and Kannada language texts are written using this script. Tulu-Tigalari is influenced by scripts such as medieval Grantha, Vatteluttu, and Telugu-Kannada. The script has been used since at least 1250 CE.

#Structure. The structure of the Tulu-Tigalari script is similar to that of other Brahmic scripts. Each consonant letter contains an inherent vowel a. It is an abugida that makes use of a virama. The script is written from left to right.

#Consonant Letters. There are 36 consonants in Tulu-Tigalari, encoded in the range U+11392..U+113B5. Two of the consonants represent Dravidian sounds and are quite rare: U+113B4 𑎴 TULU-TIGALARI LETTER RRA and U+113B5 𑎵 TULU-TIGALARI LETTER LLLA.

#Independent Vowels. Tulu-Tigalari has 14 independent vowels, encoded in the range U+11380..U+11391. These include the two diphthongs, U+1138E 𑎎 TULU-TIGALARI LETTER AI and U+11391 𑎑 TULU-TIGALARI LETTER AU. Similarly to many other Indic scripts, these 14 vowels are encoded atomically.

The alternate or rare forms of vowel letters i, u, vocalic r, vocalic rr and vocalic l should be handled as sequences, as shown in Figure 15-4.

#Figure 15-4. Rare Forms of Tulu-Tigalari Vowels

11382 𑎂

113B8 ◌𑎸

→

𑎂𑎸

11382 𑎂

113BC ◌𑎼

→

𑎂𑎼

11384 𑎄

113BC ◌𑎼

→

𑎄𑎼

11384 𑎄

113C9 ◌𑏉

→

𑎄𑏉

113D1 𑏑

11386 𑎆

→

𑏑𑎆

113D1 𑏑

11387 𑎇

→

𑏑𑎇

113D1 𑏑

11388 𑎈

→

𑏑𑎈

#Dependent Vowel Signs. All independent vowels except for U+11380 𑎀 TULU-TIGALARI LETTER A have a corresponding dependent vowel sign, encoded in the range U+113B8..U+113C8. These signs are positioned to the left, right, or below consonants and conjuncts, replacing the inherent vowel a.

Four Tulu-Tigalari vowel signs are rendered as ligatures which appear below the consonant or conjunct and ligate to the right. These are U+113BB ◌𑎻 TULU-TIGALARI VOWEL SIGN U, U+113BC ◌𑎼 TULU-TIGALARI VOWEL SIGN UU, U+113BD ◌𑎽 TULU-TIGALARI VOWEL SIGN VOCALIC R, and U+113BE ◌𑎾 TULU-TIGALARI VOWEL SIGN VOCALIC RR.

Additionally, the vowel signs u and uu change their shape depending on the consonant or conjunct they combine with. Some consonant plus vowel sign sequences can have alternate forms. A few of the many possible ligatures are shown in Figure 15-5.

#Figure 15-5. Examples of Ligatures in Tulu-Tigalari

𑎦 pa

◌𑎻 sign u

→

𑎦𑎻

𑎘 cha

◌𑎻 sign u

→

𑎘𑎻

chu

𑎒 ka

◌𑎻 sign u

→

𑎒𑎻

The Tulu-Tigalari script encodes several two-part vowel characters. U+113C7 ◌𑏇 TULU-TIGALARI VOWEL SIGN OO and U+113C8 ◌𑏈 TULU-TIGALARI VOWEL SIGN AU are split vowel signs that appear both before and after a character or conjunct. For a detailed discussion of the use of two-part vowels, see “Two-Part Vowels” in Section 12.6, Tamil.

#Canonical Equivalences. Some of the independent and dependent vowels can be visually analyzed as consisting of multiple parts corresponding to the shapes of other vowels, as shown in Figure 15-6. These multipart vowels have canonical decompositions, so that the atomic characters and the corresponding sequences are canonical equivalents. The atomic characters are the typical representation used when generating text.

#Figure 15-6. Tulu-Tigalari Canonical Sequences

11383 𑎃

≡

11382 𑎂

113C9 ◌𑏉

11385 𑎅

≡

11384 𑎄

113BB ◌𑎻

1138E 𑎎

≡

1138B 𑎋

113C2 ◌𑏂

11391 𑎑

≡

11390 𑎐

113C9 ◌𑏉

113C5 ◌𑏅

≡

113C2 ◌𑏂

113C7 ◌𑏇

≡

113C2 ◌𑏂

113B8 ◌𑎸

113C8 ◌𑏈

≡

113C2 ◌𑏂

113C9 ◌𑏉

#Various Signs. U+113C9 ◌𑏉 TULU-TIGALARI AU LENGTH MARK is not used on its own as a complete vowel sign. This mark is used to render the two-part vowel sign au and the letter ii.

The U+113CA ◌𑏊 TULU-TIGALARI SIGN CANDRA ANUNASIKA mark is analogous to the candrabindu found in other Indic scripts. It can combine with all letters and vowel signs.

A pure nasal sound is represented by U+113CC ◌𑏌 TULU-TIGALARI SIGN ANUSVARA. U+113CD ◌𑏍 TULU-TIGALARI SIGN VISARGA indicates a voiceless glottal fricative. Both anusvara and visarga are rendered to the right of the affected character.

A spacing mark, U+113B7 𑎷 TULU-TIGALARI SIGN AVAGRAHA, is used when rendering Sanskrit texts. U+113D3 𑏓 TULU-TIGALARI SIGN PLUTA is used to denote vowel lengthening.

U+113E1 ◌𑏡 TULU-TIGALARI VEDIC TONE SVARITA and U+113E2 ◌𑏢 TULU-TIGALARI VEDIC TONE ANUDATTA are tone marks used in the representation of Vedic text in Tulu-Tigalari. These two combining marks are centered directly above or below a cluster, respectively.

#Viramas and Conjoiner. U+113CE ◌𑏎 TULU-TIGALARI SIGN VIRAMA is an inherent vowel killer, and is also used in combination with other vowels to represent the Tulu vowels ŭ [ɯ] and ŭ̄ [ɯː]. Consequently, it can appear after vowel signs. Figure 15-7 shows the usual convention. The virama always appears at the end of and to the top right of a cluster.

#Figure 15-7. Examples of Vowels ŭ and ŭ̄ in Tulu-Tigalari

𑎀 a

◌𑏎 virama

→

𑎀𑏎

𑎒 ka

◌𑏎 virama

→

𑎒𑏎

k(ŭ)

𑎁 aa

◌𑏎 virama

→

𑎁𑏎

ŭ̄

𑎒 ka

◌𑎸 sign aa

◌𑏎 virama

→

𑎒𑎸𑏎

kŭ̄

Unlike in Devanagari or Kannada, viramas in Tulu-Tigalari do not form conjuncts. Instead, U+113D0 ◌𑏐 TULU-TIGALARI CONJOINER is used for the formation of conjuncts. Consonants can combine horizontally, vertically, or have a combination of both, as shown in Figure 15-8. There is a preference for horizontal ligatures (where attested) over stacked vertical conjuncts. U+113CF ◌𑏏 TULU-TIGALARI SIGN LOOPED VIRAMA is used to form the looped virama ligatures. It is only attested for ka, ga, tta, ta, and na (and some conjuncts that end with these consonants). The looped virama is tightly bound to the preceding character and does not apply at a syllable level. Conjunct sequences that end with a looped virama are rare.

#Figure 15-8. Conjuncts and Viramas in Tulu-Tigalari

𑎒 ka

◌𑏎 virama

𑎒 ka

→

𑎒𑏎𑎒

k(ŭ)ka

𑎒 ka

◌𑏐 conjoiner

𑎒 ka

→

𑎒𑏐𑎒

kka

𑎒 ka

◌𑏐 conjoiner

𑎓 kha

→

𑎒𑏐𑎓

kkha

𑎒 ka

◌𑏏 looped virama

𑎒 ka

→

𑎒𑏏𑎒

kka

𑎒 ka

◌𑏏 looped virama

◌𑏐 conjoiner

𑎒 ka

→

𑎒𑏏𑏐𑎒

kka

𑎒 ka

◌𑏐 conjoiner

𑎒 ka

◌𑏏 looped virama

→

𑎒𑏐𑎒𑏏

The common way of representing gemination is by conjuncts. However, a gemination mark is also used in many manuscripts. The U+113D2 ◌𑏒 TULU-TIGALARI GEMINATION MARK is placed after the base letter. Other combining vowel signs are added after the gemination mark.

#Repha. U+113D1 𑏑 TULU-TIGALARI REPHA is used to indicate a ra without the inherent vowel that precedes a vowel, consonant, or semi-vowel. The repha is shown in the code charts with a dashed box to emphasize its unusual behavior in interacting with the following consonant.

The repha most commonly displays as a short vertical line above the base consonant or conjunct, as shown in Figure 15-9.

#Figure 15-9. Repha Rendered as a Short Vertical Line

𑏑 repha

𑎒 ka

→

𑏑𑎒

rka

However, repha typically ligates with ma, ya, or va, as shown in Figure 15-10.

#Figure 15-10. Repha Ligating with ma, ya, or va

𑏑 repha

𑎪 ma

→

𑏑𑎪

rma

𑏑 repha

𑎫 ya

→

𑏑𑎫

rya

𑏑 repha

𑎮 va

→

𑏑𑎮

rva

When repha and a virama co-occur in a syllable, the repha visually ligates with the virama, as shown in Figure 15-11.

#Figure 15-11. Repha Ligating with Virama

𑏑 repha

𑎒 ka

◌𑏎 virama

→

𑏑𑎒𑏎

rk(ŭ)

#Digits. The Kannada digits U+0CE6..U+0CEF should be employed to represent digits in Tulu-Tigalari.

#Punctuation. Tulu-Tigalari has script-specific forms of the danda and double danda punctuation marks: U+113D4 𑏔 TULU-TIGALARI DANDA and U+113D5 𑏕 TULU-TIGALARI DOUBLE DANDA.

U+113D7 𑏗 TULU-TIGALARI SIGN OM PUSHPIKA can either represent the om sound or it can be used as an indicator for beginnings, pauses, endings, or space fillers. Although om pushpika and U+113D8 𑏘 TULU-TIGALARI SIGN SHRII PUSHPIKA may superficially resemble the corresponding phonetic syllables, they are used as space fillers and for other decorative purposes.