Unicode Frequently Asked Questions

Bengali (Bangla) / Assamese

Q: Where can I find documentation about Bengali in Unicode?

The main documentation for the Bengali script can be found in Section 12.2, Bengali in The Unicode Standard. Also see the Bengali code chart.

Q: The name "Bangla" should be used in the Unicode Standard instead of Bengali. What can I do to correct the spelling?

The Unicode Standard does not intend to limit the names that people use for their own scripts, languages or characters. The particular labels used in the standard to identify characters and blocks are chosen as unambiguous and fixed identifiers in the context of the standard. As formal identifiers, they are subject to stability constraints and cannot be changed. In the case of Bengali, annotations and explanations have been added to the standard regarding other preferred names, such as Bangla.

Q: Why isn't Assamese encoded as a separate script? Assamese is a recognized script of India.

The meaning of "Bengali script" in the Unicode Standard includes all of the letters used both for the Bangla language of Bangladesh and of West Bengal state in India, but also for the Assamese language (and other languages) of Assam state. There are some letters used in Bangla that are not used in Assamese, and some letters in Assamese that are not used in Bangla, but in the Unicode Standard, the Bengali script refers to the whole set of letters needed for both.

This situation for the Bengali script can be compared to that for the Arabic script, for example. The Arabic script is used to write the Arabic language, of course, but it is also used to write the Persian language in Iran, as well as the Urdu language in India, and many others. The set of Arabic letters needed to write the Arabic language is different from the set of Arabic letters needed to write Persian. In the Unicode Standard, the "Arabic script" refers to the superset of Arabic letters needed for writing all of those different languages.

Q: Where is the Bengali khanda ta letter? This letter is needed to form words such as utkarsha.

The unique Bengali khanda ta letter was added to the Unicode Standard as of Version 4.1 in 2005. It is encoded at: U+09CE "ৎ" BENGALI LETTER KHANDA TA. The appropriate use of this character is described in Section 12.2, Bengali.

Q: Why is the Bengali hasant called a "virama" in Unicode?

The Bengali hasant is the mark used below a consonant letter to indicate the "killing" of the inherent vowel when that consonant is the first in a conjunct sequence. It is encoded as U+09CD " ্" BENGALI SIGN VIRAMA in the Unicode Standard. This is structurally analogous to the virama in the Devanagari script, which is also known as the halant. In the Unicode Standard, many of these consonant killer marks in related scripts of India are called "virama". This is just a matter of consistent terminology in the text of the Unicode Standard; the local name for the hasant can always be used wherever that name is preferred.

Q: How do I write Bengali ya-phalā?

When U+09AF BENGALI LETTER YA (antaḥstha ya) occurs as the last member of a consonant cluster it has a special shape Bengali ya-phalaa glyph called ya-phalā.

To produce this shape in Unicode, just type the underlying sequence of characters as you would for any other consonant cluster. For example, ত্য at the end of the word সাহিত্য "literature" is written <U+09A4 ত ta, U+09CD ্ hasant, U+09AF য ya>. The font should produce the correct shape.

If ya follows ra in a consonant cluster, the font will normally produce the reph over the full form of ya, as in পর্যন্ত "until". On the rare occasions when you want to retain the ya-phalā shape when ya follows ra, e.g. র‍্য, add U+200D ZERO WIDTH JOINER before the hasant.

If your browser does not show these examples clearly, see the documentation about ya-phalā in Section 12.2, Bengali.

Q: Does it take 3 separate keystrokes on my keyboard in order to type ya-phalā or similar characters?

A well-designed keyboard can provide individual keys for any sequence that users in a particular language would consider a single entity. So a Bengali keyboard can easily provide a single key for the entire sequence Bengali ya-phalaa glyph = <U+200D, U+09CD, U+09AF> = <ZWJ, hasant, ya>, or for other sequences, as needed.

Q: What are the Bengali characters used to transcribe the sound [æ] (as in English "bat") in Unicode?

When the foreign sound [æ] occurs at the beginning of a word borrowed into the Bangla language, it is usually written either as "অ্যা" or "এ্যা". These consist of sequences of the independent vowel letter a (U+0985 "অ") or e (U+098F "এ") followed by hasant, the letter ya (U+09AF) and vowel sign -aa (U+09BE "া"). As for other special-use sequences, a Bengali keyboard could map these entire sequences to single keys, for convenience in entry.

If your browser does not show these examples clearly, see the documentation about representation of [æ] in Section 12.2, Bengali.

Q: The Bangla full stop known as dari is similar to the Devanagari danda (U+0964), but the corresponding point in the Bengali block at U+09E4 is reserved. What should I use dari?

Many punctuation characters are shared across multiple scripts in the Unicode Standard. So, for example, most scripts share the common Western punctuation marks, such as U+002C "," COMMA and U+002E "." FULL STOP, without having to encode separate characters for these marks for every different script. In South Asia, many scripts also share the danda and double danda punctuation marks originally derived from Brahmi text usage. For Bengali, a dari is simply represented by using U+0964 "।". It does not matter that that punctuation mark is encoded in the Devanagari block, any more than it matters that the comma is encoded in the Basic Latin block―both punctuation marks are equally accessible for a Unicode implementation, including an implementation of the Bengali script. A keyboard for Bengali should simply map the dari to U+0964.

Q: I have questions about other scripts of India and South Asia. Where can I find answers?

See Indic Scripts and Languages.