From: James Kass (thunder-bird@earthlink.net)
Date: Sat Jan 03 2009 - 19:18:08 CST
Asmus Freytag wrote,
>>>> ... I don't need to be able to
>>>> speak Hindi in order to enter, store, search, and collate text
>>>> written in Devanagari, and neither does my plain-text editor.
>>>>
>>> But your plain text database on your web server cannot present Hindi
>>> words in the order a user of your website in India would expect them,
>>> unless the text (that is its character codes) can be interpreted.
>>
>> Method # 1 (easy way)...
>>
>Followed by several screenfuls of text, belying the "easy".
The original jibe was aimed at my statement that neither my
plain-text editor nor myself needed to speak Hindi. We don't,
I proved it. It's easy.
>And when I read to the end, it's only binary, or dictionary based sort,
No, you may have skipped something in that section. Or you
may have lost track that your challenge was to present Hindi
words in the order expected by Hindi users. Or, maybe I've
missed something. Are you saying that Hindi users expect
their word lists to be sorted in something other than dictionary
order?
All computer sorts are binary, by the way.
>that wouldn't work on general text. Or requires human intervention
>(hiring the right engineer), or requires getting a charset definition
>for the PUA scheme and custom built implementation to that. All things
>that should be unnecessary when I use Unicode.
I'm practically speechless. Human beings interpret text.
Computer programs, written by human beings, might ape that
behavior. Whenever you use Unicode, you use applications
designed by people. Sounds like hiring the right engineer
happens a lot.
>> These aren't compatibility characters unless the book definition
>> in 5.0 has been trashed/revised.
>>
>See my reply to Everson.
I'll check the archives, your reply to Michael Everson does
not seem to have arrived here. Guess one of my previous
responses to you about compatibility didn't get archived
by the list or distributed, as well.
>> Compatibility characters are variants of characters which
>> already exist in Unicode or they have a compatibility decomposition.
>> This has shifted around some over the years, but that's it, isn't it?
>>
> No.
That's what it says in the book 5.0 on page 23.
>> Can you please point me to the new definition of compatibility
>> in this regard?
>>
>See the glossary.
The glossary definition is general, the specifics amplifying
that general definintion are in the text portion of the book.
>> Emoji and emoticon-as-rich-text are identical concepts.
>
>They have similarities / overlap, but they are distinct phenomena. One
>has fallbacks using ASCII "markup", the other uses single, dedicated
>character codes.
Calling ASCII emoji fallback is imprecise. Calling icon
substitution for ASCII strings fall-forward would be better.
Substituting pictures for text isn't done in plain-text.
Rather than insisting on preserving a nebulous distinction
between emoji and emoticon, it would be better to maintain
technical distinctions between plain- and rich-text.
>Too long. I had to skip some sections.
Too bad. You might've missed something.
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 19:20:26 CST