I am forwarding this article of mine published in my blog/website and with title and other alterations in on-line journal Times of Assam. I had also written a detailed report on the issue forwarded to all concerned including the Unicode Consortium. I hope solution comes through the co-operation of all involved in the issue.
ASSAMESE
AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS
The
Unicode Consortium, a non-Governmental body with headquarters in the
U.S.A with Governmental agencies of many countries also as members ,
have standardised and maintains a Universal Character Set (UCS), i.e.
a standard that defines, in one place, all the characters needed for
writing the majority of living languages in use on computers. It aims
to be, and to a large extent already is, a superset of all other
character sets that have been encoded. Unicode (as the UCS is
commonly referred to) can access over a million characters of which
about 100,000 have already been defined. These include characters for
all the world's main languages along with a selection of symbols for
various purposes.
REASONS OF DISSENSIONS AMONG THE ASSAMESE :
1. Non-representation/misrepresentationof the Assamese
writing systemin the Unicode Standardbecause the Unicode
Consortiumand also
the Government of
India thinksthat the
current Bengali Code
chartwill servethe purposeof usingthe Assamese language
in computers.
2. The script isnamed as Bengaliand
all character
descriptorsin the
Unicode Code Chart named as per the Bengali
nomenclature andAssamese areforced touse it,
neither theGovernment of India and
theUnicode
Consortium iswilling todo anything
positiveon it.
Both take itas a political issueand cite multiple
technical difficultiesin solving it, and try
to convince the complainants that nothing is wrong with it.
3. But the fact
remainsthat the Assamese alphabet"ৰ"(Ro) is
beingdescribed as Bengali letter"র"(Ro)with middle
diagonal, in the
Bengali chart of the Unicode Standard.
4. Assamese
alphabet"ৱ"(Wobo)described as Bengali
letter"র"(Ro)with lower
diagonal, in the
Bengali chart of the Unicode Standard.
5. Thirteen
other Assamese alphabetssimilarly misrepresentedin the Bengali chart of the Unicode Standard.
6. Assamese
alphabet"ক্ষ"(Khya) is not
representedat all in
the Bengali Code Chart of the Unicode.
7. This results in gross Collation Errorwhich occurs when sorting
softwaresare run in Assameseas because "ৰ"(Ro)
and "ৱ"(Wobo)
are not in proper
placeand "ক্ষ" (Khya)
is not representedat all in the BengaliCode Chartof the Unicode
Standard.
SOLUTIONS
UNDER CONSIDERATON :
1.RENAMING
OF THE SCRIPT AND ALTERNATIVE NOMENCLATURE OF THE CHARACTER
DESCRIPTORS
This is statedin the beginningbecause, the Government
of Indiaseems more
interestedin solving
it that way. Renamingof the current Bengali
script in the Unicode Standardwith a name acceptableto allhas been proposed by many. The problemwith the renamingsolution is there, bothin the Bengaliand Assamese sideand most important a technical
problemis associated
with it.
A. Will
the Bengali community agree to it,
considering that the present
Bengali code chartis servingtheir purposequite well. The Bengali community is there in two sovereign countries Indiaand Bangladesh.
B. The major problem lies on the
Assamese side, will the renamingbe limitedto the renaming of the name of the Script
and Code chartonly
or will it includethe misrepresented
character descriptors' nomenclaturealso. For example the following Assamese
characters have Bengali descriptors,different from how
they would have been described in Assamese.
Unicode
code
point
character
UTF-8
(hex.)
UNICODE NAME BENGALINAME ASSAMESE IPA ASSAMESE
U+099A চ e0 a6 9a BENGALI LETTER
CA ASSAMESE
LETTER
SA
(PRATHAM) s
U+099B ছ e0 a6 9b BENGALI LETTER
CHA ASSAMESE
LETTER
SA
(DWITIYA) s
U+099F ট e0 a6 9f BENGALI LETTER
TTA ASSAMESE
LETTER
TA
(MURDHENYA) t
U+09A0 ঠ e0 a6 a0 BENGALI LETTER
TTHA ASSAMESE
LETTER
THA
(MURDHENYA) th
U+09A1 ড e0 a6 a1 BENGALI LETTER
DDA ASSAMESE
LETTER
DA
(MURDHENYA) d
U+09A2 ঢ e0 a6 a2 BENGALI LETTER
DDHA ASSAMESE
LETTER
DHA
(MURDHENYA) dh
U+09A3 ণ e0 a6 a3 BENGALI LETTER
NNA ASSAMESE
LETTER
NA
(MURDHENYA) n
U+09AF য e0 a6 af BENGALI LETTER
YA ASSAMESE
LETTER
ZA
(ANTUSTYA) z
U+09B6 শ e0 a6 b6 BENGALI LETTER
SHA ASSAMESE
LETTER
XA
(TALOBYA) x
U+09B7 ষ e0 a6 b7 BENGALI LETTER
SSA ASSAMESE
LETTER
XA
(MURDHENYA) x
U+09B8 স e0 a6 b8 BENGALI LETTER
SA ASSAMESE
LETTER
XA
(DONTIYA) x
U+09C0 ী e0 a7 80 BENGALI VOWEL SIGN
II ASSAMESE
VOWEL SIGN
I
(DIRGHA) i
U+09C2 ূ e0 a7 82 BENGALI VOWEL SIGN UU ASSAMESE
VOWEL SIGN
U
(DIRGHA) u
U+09CD ্ e0 a7 8d BENGALI SIGN VIRAMA ASSAMESE
SIGN REF
U+09CE ৎ e0 a7 8e BENGALI LETTER KHANDA TA ASSAMESE
LETTER HASANTA TA t
U+09D7 ৗ e0 a7 97 BENGALI AU LENGTH MARK ASSAMESE
VOWEL SIGN AU
(TIBETO-BURMAN)
U+09DC ড় e0 a7 9c BENGALI LETTER
RRA ASSAMESE
LETTER
RA
(DORE)
U+09DF য় e0 a7 9f BENGALI LETTER
YYA ASSAMESE
LETTER
YA iɒ
U+09F0 ৰ e0 a7 b0 BENGALI LETTER RO WITH MIDDLE DIAGONAL ASSAMESE
LETTER
RA r
U+09F1 ৱ e0 a7 b1 BENGALI LETTER RO WITH LOWER DIAGONAL ASSAMESE
LETTER
WA
w
U+09FA ৺ e0 a7 ba BENGALI ISSHAR ASSAMESE
SIGN
SWARGIO
(LATE/HEAVENLY)
Supposingrenamingis taken up as the best
solution for solving
the controversy then the whole
current Bengali Code Chartof the Unicode
Standardwill have to
have alternative
nomenclature beginning
with the titleof the script like ASSAMESE
AND BENGALIand theindividual characterswill also have alternative
character descriptorslike this :
U+09B8 "স" e0 a6 b8 =BENGALI LETTER SA / ASSAMESE LETTER XA
(DONTIYA)
U+09AF "য" e0 a6 af =BENGALI LETTER YA / ASSAMESE LETTER ZA (ANTUSTYA)
If such an alterationis possibleand every
characteris givenboththe Assameseand Bengali
descriptorsand the script
renamedas per an acceptable
nameand the displacedand missing
Assamese characters"ৰ"(Ro)and "ৱ"(Wobo)and "ক্ষ" (Khya)putin proper placein the chart,for proper
collationthe problem may be solved.
But
as per the basic
principleof a Unique
Code, one particular
entity can have one identifier,
in this case around fifteen
characterswill have one
identifier for two entities.
If
Unicode Consortium or the Indian Government thinks that this basic
principle of Unique Codification can be violated then the matter may
be acceptable to the Assamese and Bengali alike.
2.
SEPARATE SLOT/RANGE FOR THE ASSAMESE SCRIPT
If
renaming in the way described above is not possible, then allocation
of a separate slot/range for the Assamese Script remains the only
solution. Whichis perhaps easier
for the Unicode Consortiumto do. Government
of Assamhas also movedthe Government
of Indiaseeking a separate
slot/rangefor the Assamese
script. Allocation of a separate
slot/range for the Assamese Scriptwill mean Unicode
Consortiumallowing and
acceptingduplication of
characters.
The Unicode
Consortiumhas already allowedand acceptednot only duplicationbut in case of some of the characters triplicationof charactersin the three
major European writing systemsviz. Cyrillic, Greekand Latin.
Consequently in the Unicode Standard has more than the following
number of duplicate characters :
a=2,
A=3, B=3, c=2, C=2, e=2, E=3, H=3, i=2, I=3, j=2, J=2, K=2, M=3, N=2,
o=2, O=3, p=2, P=3, s=2, S=2, T=2, x=2, X=3, y=2, Y=2 and Z=2
Here
only there are a total of 63 (sixty three characters) duplicatedbetween the three major European writing systems the Cyrillic, Greekand Latin, theactual number is more than
this.
Number
wise duplication of characters will be perhaps much less than this,
if Bengali and Assamese scripts are duplicated and allocated
separate slots/ range for themselves.
CONCLUSION :
The
solution therefore lies in duplicity. In the first option there is
going to be duplicity of the Unique Codes meaning single code for two
entities and in the second option there is going to be duplicity of
characters meaning two characters of the same appearance. The Unicode
Consortium and the Government of India has to choose between the two.
Duplicity of characters is already there in the Unicode Standard but
whether duplicity of Unique Codes are there, or whether it is
acceptable to the experts, whether it is justified, it is not known,
because duplicity itself means loss of uniqueness of any Unique Code.
For
full details on the issue go to this webpage
http://drsatyakamphukan.wordpress.com/assamese-and-unicode
Dr
Satyakam Phukan
General
Surgeon
Jorpukhuripar,
Uzanbazar
Guwahati,
Assam
P.I.N
: 781001
Phone:
99540 46357
Website
: http://drsatyakamphukan.wordpress.com
Received on Sat Jul 07 2012 - 14:43:29 CDT
This archive was generated by hypermail 2.2.0 : Sat Jul 07 2012 - 14:43:31 CDT