RE: Continue:Glaring mistake in the code list for South Asian Script from Peter Constable on 2011-09-10 (Unicode Mail List Archive)

From: Peter Constable <petercon_at_microsoft.com>
Date: Sat, 10 Sep 2011 16:59:14 +0000

Once a script is encoded, the reference name used in the Standard for the script becomes part of stable character identifiers that _cannot be changed_. This is not just Unicode policy; this is policy of ISO JTC1/SC2. The reference name "Bengali" for the script in question cannot be changed. The most that could be done would be to add a comment indicating that the script is also known as "Eastern Nagari" or that the script is used for Assamese, Manipuri, and other languages as well as the Bengali language. But, in fact, the Standard already says this--see TUS 6.1, section 9.2, page 985 (http://www.unicode.org/versions/Unicode6.0.0/ch09.pdf):

<quote>

9.2 Bengali (Bangla)

Bengali: U+0980–U+09FF

The Bengali script is a North Indian script closely related to Devanagari. It is used to write
the Bengali language primarily in the West Bengal state and in the nation of Bangladesh. In
India and Bangladesh, the preferred name for the script and the language is Bangla. The
script is also used to write Assamese in Assam and a number of other minority languages,
such as Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Rian,
and Santali, in northeastern India.

</quote>

If there is any reasonable revision to this informative text that you think would improve it, you should submit that feedback; you can do that using the online feedback mechanism at http://www.unicode.org/reporting.html.

Peter

-----Original Message-----
From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of anbu_at_peoplestring.com
Sent: Saturday, September 10, 2011 2:09 AM
To: kent.karlsson14_at_telia.com
Cc: delexr_at_indiatimes.com; unicode_at_unicode.org
Subject: Re: Continue:Glaring mistake in the code list for South Asian Script

Hi Unicode Community!

I recommend to Unicode that this grievance is taken into account. No one consonant in this code range is used by only one language. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet#Consonants

The Indian census of 1961 recognised 1,652 different languages in India (including languages not native to the subcontinent). The 1991 census recognizes 1,576 classified "mother tongues". Refer:

http://en.wikipedia.org/wiki/Languages_of_India#Inventories.

The Eastern Nagari script is an Abugida system of writing belonging to the Brahmic family of scripts whose use is associated with the Assamese, Bengali, Bishnupriya Manipuri, Maithili, Mising, Meitei Manipuri, Sylheti, and Chittagonian languages. Refer:

http://en.wikipedia.org/wiki/Eastern_Nagari_alphabet

The Bengali alphabet (Bengali: বাংলা লিপি bangla lipi or Bengali: বঙ্গলিপি
bôņgôlipi) is the writing system for the Bengali language. The same script is the basis for the Assamese, Meitei, Bishnupriya Manipuri, Kokborok, Garo and Mundari alphabets. All these languages are spoken in the eastern region of South Asia. Refer:

http://en.wikipedia.org/wiki/Bengali_alphabet

I propose to Unicode that it renames this code range as "Eastern Nagari"
or "East(ern) South Asian" Script.

Regards,
Anbu Kaveeswarar Selvaraju

On Sat, 10 Sep 2011 02:44:59 +0200, Kent Karlsson <kent.karlsson14_at_telia.com> wrote:
> Den 2011-09-10 00:53, skrev "delex r" <delexr_at_indiatimes.com>:
>
>> I figure out that Unicode has not addressed the sovereignty issues of
>> a language
>
> Which, I daresay, is irrelevant from a *character* encoding perspective.
>
>> while trying to devise an ASCII like encoding system for almost all
>> the characters and symbols used on earth. I am continuing with my
>> observation of the glaring mistake done by Unicode by naming a South
>> Asian Script
as
>> ³Bengali². Here I would like to give certain information that I think
>> will be of some help for Unicode in its endeavour to faithfully
>> represent a Universal Character encoding standard truer to even
>> micro-facts.
>>
>> India is believed to have at least 1652 mother tongues out of which
only
>> 22
>
> One list of languages in India is given in
> http://www.ethnologue.com/show_country.asp?name=IN
> (I did not count the number of entries)
>
>> are recognized by the Indian Constitution as official languages for
>> administrative communication among local governments and to the
>> citizens. And the constitution has not explicitly recognized any
>> official script. As Unicode has listed the languages and scripts, the
>> Indian Constitution has also listed
>
> Unicode does not list any languages at all. Ok, the CLDR subproject
copies
> a
> list of language codes from the IANA language subtag registry, which
> (in
a
> complex manner) takes its language codes from (among others) the ISO
639-3
> registry, which largely is in sync with Ethnologue (as in the list
above);
> but I guess that is not what you referred to.
>
>> the official languages ( In its 8th schedule). The first entry in
>> that list is the Assamese language. Assamese is a sovereign language
>> with its own grammar
>
> Which I don't think is in dispute at all.
>
>> and ³script² that contains some unique characters that you will not
find
>> in
>> any of the scripts so far discovered by Unicode. At least 30 million
>> people
>
> Unicode (at this stage) does not do any "discovery". Unicode and
> ISO/IEC
> 10646 is driven by applications (proposals) to encode characters (and
> define properties of characters).
>
>> call it the ³Assamese Script² and if provided with computers and
>> internet
>
> If you want to disunify the Bengali script (and characters) from
Assamese,
> you need to show, in a proposal document, that they really are
> different scripts, and should not be unified as just different uses of
> the same script.
>
>> connection can bomb the Unicode e-mail address with confirmations.
These
>
> Hmm, an email bombing threat... I'm sure Sarasvati can find a way to
block
> those (or we may all simply file them away as spam).
>
>> characters are, I repeat, the one that is given a Hexcode 09F0 and
>> the other with 09F1 by this universal character encoding system but
>> unfortunat!
>> ely has described both as ³Bengali² Ra etc. etc. I don¹t know who
>> has advised Unicode to use the tag ³Bengali² to name the block that
>> includes these two characters.
>>
>> If you are not an Indian then just google an image of an Indian
Currency
>> note.
>> There on one side of the note you will find a box inside which the
value
>> of
>> the currency note is written in words in at least 15 scripts of
official
>> Indian languages.( I don¹t know why it is not 22). At the top , the
>> script is Assamese as Assamese is the first officially recognized
>> language
>> (script?) .
>> Next below it you will find almost similar shapes. That is in Bengali.
>> India
>> officially recognises the distinction between these two scripts which
>> although shaped similar but sounds very different at many points. And
>> the
standard
>
> Minor font differences is not a reason for disunification. Different
> pronunciations of the same letters is not a reason for disunification
> either. Just think of how many different ways Latin letters (and
> letter
> combinations) are pronounced in different languages (x, j, h, v, w, f,
...;
> even "a" gets different pronunciation in British English vs. US
> English, and that is within the same language...; and most
> orthographies aren't very accurately phonetic anyway, with quite a bit
> of varying (contextual and dialectal) pronunciation for the letters).
>
>> assamese alphabet set has extra characters which are never bengali
>> just like London is never in Germany.
>
> There are 8 London in the USA, two in Canada, one in Kiribati, ... ;-)
> (http://en.wikipedia.org/wiki/London_(disambiguation))
>
>> Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have
>> nothing Bengali in them and interestingly 09F1 ( sounds WO or WA when
>> used
within
>> words) has even nothing ŒRa¹ sound in it. Thus you know, with actual
>> Bengali alphabet set one can¹t write anything to produce the sound
>> ³Watt² as in James Watt and instead need to combine three alphabets
>> but even then only to sound like ³ OOYAT ³ in Bengali itself.
>
> Yes, English has a rather peculiar pronunciation for the letter W...
> ;-) Several languages will pronounce Watt (without changing the
> spelling) as Vatt, and regard that as a normal pronunciation of Watt.
>
>> Therefore Unicode must consider terming the block range as ³Assamese²
>> which will faithfully describe the block range with 09F0 and 09F1 in
>> it and replace all tags ³ Bengali² with ³Assamese² in the code
>> descriptions and vice versa .
>> London is in England and Berlin is in Germany. You just can¹t bring
>> London into Germany and then say England is in Germany. You can¹t
>> live with a lie or wrong too long.
>
> See above re. London. ;-) As for Berlin: see
> http://en.wikipedia.org/wiki/Berlin_(disambiguation)...
> (I still fail to see how this would be analogous in any way whatsoever
to
> your quest.)
>
>
> Yes, I have responded with a quite large dose of irony. Dryer and to
> the point responses by others seem to have passes unnoticed.
>
> /Kent K
Received on Sat Sep 10 2011 - 12:04:41 CDT

This archive was generated by hypermail 2.2.0 : Sat Sep 10 2011 - 12:04:42 CDT