From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Mar 01 2008 - 21:43:43 CST
On 3/1/2008 9:58 AM, David Starner wrote:
> On Sat, Mar 1, 2008 at 12:53 AM, Javier SOLA <lists@khmeros.info> wrote:
>
>> Dear Ngwe Tun,
>>
>> Why is it important to encode the fractions? Are they a common part of
>> Myanmar text? Fractions are usually not encoded in other languages.
>>
>
> Unicode intends to be a complete standard for encoding text, including
> a huge array of characters including the interrobang, obscure phonetic
> characters, characters for obsolete orthographies for small languages,
> and Chinese characters found only in dictionaries. A character does
> not need to be a common part of Myanmar text to be worth encoding.
> None of this affects Kent Karlsson's argument that they've already
> been encoded, of course.
>
>
>
>
The question whether a character is common (or part of a common script)
is not as important as whether it is something that (1) can and (2)
should be standardized.
(1) Things for which there is conflicting or sketchy usage information,
or uncertainty about meaning and (salient features and permitted
variation of) appearance (or both) simply can't be standardized, because
the act of standardization requires that such information be available,
sufficient, and settled.
(2) Things that are private, short-lived, whimsical, idiosyncratic, or
don't fit a reasonable description of "character" are things that
shouldn't be standardized. (Also things that would violate stability
policies). Character codes are forever, and standardization implies that
there are both senders and receivers that will be interested in
exchanging this character, and (at least some) implementations will
incur the costs to make the possibility of such interchange real.
Characters that are not common, and members of scripts that are not
common, often suffer from the first problem, until such time that
research has caught up with them. That, however, merely reflects the
difficultly of getting access to sufficient information about something
that's rare or obscure from the vantage point of the character coder. It
does not reflect a desire to not encode something simply because its not
common.
Entities that should not be coded, on the other hand, can be quire
common (e.g. the apple logo). If commonality was the primary yardstick
it would have long since been part of Unicode.
Anybody who has followed the discussion knows that the actual boundary
is not drawn in black and white, but that resolving the status of
proposed entities that are questionable involves a lot of judgment and
discussion. Sometimes, decisions to rule out characters are even
overturned later, unless doing so would violate a stability policy, of
course.
A./
This archive was generated by hypermail 2.1.5 : Sat Mar 01 2008 - 21:46:37 CST