From: Michael S. Kaplan (michka@trigeminal.com)
Date: Sat Feb 09 2008 - 12:58:09 CST
They technically have been told this on more than one occasion by more than
one person over the last 7-8 years.
Some messages are harder to hear than others.
MichKa
----- Original Message -----
From: "Bala" <bala@cse.mrt.ac.lk>
To: "'James Kass'" <thunder-bird@earthlink.net>; "'Unicode Mailing List'"
<unicode@unicode.org>
Sent: Saturday, February 09, 2008 9:19 AM
Subject: RE: minimizing size (was Re: allocation of Georgian letters)
> James Kass----
> Another shame is telling Tamil users that Unicode won't standardize
> a duplicate encoding until a certain event happens. This gives the
> misleading impression that there's at least a possibility that Unicode
> might encode TACE/TUNE.
>
> It would have been much better, my opinion, to simply have told people
> up front that there is absolutely no possibility whatsoever for such a
> duplicate encoding in the standard. In which case, the people who have
> spent time and effort towards such an encoding could have been doing
> something productive with their time and resources instead of wasting
> them. Like, for example, solving problems with the PDF format
> related to complex scripts.
> -----
>
> I attended to the Chennai meeting last month. UTC were very clearly
> mentioned that dual encoding it's not possible at all in any stage in the
> meeting.
> They suggested few other solutions in case if TACE wanted to be used.
> (Like IANA)
>
> Anyway is not mean that Tamil is a complex script. In present Tamil we
> have a defined set of elements (326) which used to built the text. If you
> take the Indic languages, from my understanding Sinhala has the more
> letters and Tamil has the least letters. Except Tamil, other Indic
> languages does have the combined forms which produce the combined letters
> and make the language complex. In Sinhala there are few thousands letters
> can be logically generated. Some of the letters people are not using in
> the text, but logically there is such letters. However in Tamil there not
> combined letters concepts at all.
>
> In Tamil there is only 1 Conjoint Consonant (ksh) and 1 Conjoint syllable
> (Shrii) are presently been used in text. These are totally borrowed
> elements. This why Tamil should not be considered as complex script and
> expected as Level 1 encoding in Unicode. However Unicode were very clear
> in the Chennai meeting that dual encoding is not possible and present
> encoding cannot be deprecated as well.
>
>
> Thank you
>
> Kind Regards
> Bala
>
> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
> Behalf Of James Kass
> Sent: Saturday, February 09, 2008 3:25 PM
> To: Unicode Mailing List
> Subject: Re: minimizing size (was Re: allocation of Georgian letters)
>
>
> Doug Ewell wrote,
>
>> As much as I like BabelPad (it has replaced SC UniPad as my favorite
>> full-service-Unicode editor), I have had serious problems pasting text
>> into BabelPad from the clipboard. Sometimes there is a large chunk of
>> random text after the "real" data; there have been other symptoms as
>> well. I assume Andrew will be able to resolve these when he has a
>> chance to update the program.
>>
>> Except in the presence of bugs such as this, Unicode data can be copied
>> and pasted from one Unicode-aware program to another Unicode-aware
>> program with 100% fidelity, regardless of the encoding model.
>
> (Andrew responds well to reported problems, but how can he fix bugs
> in third-party PDF applications?)
>
> The operative phrase is "Unicode-aware application". I believe it would
> possible to copy/paste text back-and-forth between BabelPad and
> Notepad until the mouse wore out without data corruption.
>
> PDF has long been touted as *the* way to safely send text with the
> assurance that the recipients will be able to display that text exactly
> as the author intended. While it's true that the recipient sees what
> was intended, it does not seem to be true that actual text is being
> sent. Once the material is in PDF format, no further text processing
> appears to be possible; the actual text has been lost somewhere along
> the way. (ASCII text notwithstanding.)
>
> Without any real knowledge of the PDF format and what happens when
> converting a file to PDF, it appears to me that it is not text which is
> being embedded. Rather, the process is embedding glyphs. If a glyph
> is mapped to a Unicode value, at least some applications can return that
> value. But, if the glyph is not mapped to a unicode value (which is
> normally the case with presentation forms used in complex scripts),
> there does not seem to be any effort made to preserve the Unicode
> string which generated the presentation form. And that's really a
> shame.
>
> Another shame is telling Tamil users that Unicode won't standardize
> a duplicate encoding until a certain event happens. This gives the
> misleading impression that there's at least a possibility that Unicode
> might encode TACE/TUNE.
>
> It would have been much better, my opinion, to simply have told people
> up front that there is absolutely no possibility whatsoever for such a
> duplicate encoding in the standard. In which case, the people who have
> spent time and effort towards such an encoding could have been doing
> something productive with their time and resources instead of wasting
> them. Like, for example, solving problems with the PDF format
> related to complex scripts.
>
> Best regards,
>
> James Kass
>
> P.S. - There's a special FAQ page for Tamil encoding issues here:
> http://unicode.org/faq/tamil.html
>
> Suggested additions to that page might include:
>
> Q: Is there any possibility that a new character encoding scheme for
> Tamil which considers ligatures as characters will either be added to
> Unicode side-by-side with the existing Unicode Tamil encoding or
> replace the current Tamil Unicode encoding model altogether?
>
> A: No.
>
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 13:01:12 CST