From: Doug Ewell (doug@ewellic.org)
Date: Tue Nov 18 2008 - 20:41:45 CST
<abysta at yandex dot ru> wrote:
> If I need a multi-character letter “s with acute”, I have to choose
> between 015B and 0073+0301. Wouldn’t it be better not to have to
> choose?
In some ways, it probably would have been better. It certainly would
have made things simpler to understand and to explain.
However, at the time Unicode was conceived, it would have been
impossible to persuade vendors and developers to make the switch from
existing 8-bit character sets, such as those in the ISO 8859 family,
unless most (if not all) of the mappings from these character sets to
Unicode were 1-to-1.
At the same time, the Unicode pioneers realized that the set of
letters-with-diacritics was more or less open-ended, and it would be
somewhere between extremely time-consuming and inefficient and downright
impossible to encode them all as precomposed characters. For this
reason and others, the combining characters were also added.
When you choose between <015B> and <0073 0301>, you are essentially
choosing a normalization form, and at that point, the rest of your
decision process is fairly straightforward -- keep all your text in the
same normalization form. This means you would not want to use both
<015B> and <0073 0301> in the same text.
Sometimes there are external influences that steer you toward one form
or another. For example, the specifications for some protocols strongly
recommend that you use Normalization Form C, in which you would use
<015B> rather than <0073 0301>, but in which you would be obligated to
use <04E9 0304> since there is no precomposed equivalent.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Tue Nov 18 2008 - 20:44:23 CST