From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Aug 16 2009 - 14:05:49 CDT
On 8/16/2009 5:18 AM, Shriramana Sharma wrote:
> Hello. I am new to the Unicode list. Please be patient. I promise to
> do my homework by googling at least once before asking questions here.
>
> This is with reference to the following text from the P&P document N3452.
>
> <quote>A strong case of disunification occurs where there is prevalent
> practice of using the existing character. A weak case of
> disunification occurs where there is little or no use of the existing
> character for the purpose for which the new character is intended.
>
> Example: Adding a period in a new script is a weak disunification if
> we assume that nobody has an existing implementation of that script
> using the regular period. Adding a clone of a Latin letter for use
> with Cyrillic script is a strong disunification as mixed
> Latin/Cyrillic character sets exist and have been used for encoding
> the languages that the new characters are intended for.</quote>
>
> I would like to know what exactly the adjectives "strong" and "weak"
> are supposed to mean. Does "strong case" means that the case is highly
> supportive of disunification or that strong reasons need to be
> supplied before the disunification is accepted? Similarly for "weak
> case".
A "strong case" is not the same as a "strong disunification".
"Implementation" in some sense means the existence and use of a
"character set".
For Cyrillic, many character sets exist (and have existed for a long
time, even prior to Unicode) that contain _both_ the Latin alphabet and
the Cyrillic alphabet. The shape "a" occurs in both alphabets, and has
been encoded using two character codes. On the other hand the shape "z"
is thought to occur only in one alphabet (the Latin) and is coded only
once. If some not-so-well-known language has been written in Cyrillic,
but using the "z" shape, all digitally encoded documents created would
have to have used the "z" shape with the character code in the Latin
alphabet section of those character sets.
If, after many years of such practice, someone proposes a *new*
character code for the same shape "z", to be used only when that shape
occurs in the Cyrillic context, the P&P document calls that a "strong
disunification" because suddenly, there's a choice for users, and old
documents have, by force, made the "wrong" choice. As a result, the
status of the "z" in the Latin alphabet section has changed, and in some
contexts (like searching) one now needs to consider *two* characters as
identical. "Strong" disunification means, the breaking apart of the uses
of an existing character in a possibly disrupting way, and spreading
them over two characters, one of them new.
If a script is first computerized at the time it is encoded in Unicode,
then adding a clone of the period does not disrupt users of 002E
(standard period) in the same way - there are no old (digitally encoded)
documents to worry about, and users of the new scripts know from day one
to use the new character. This is a "weak" disunification (in the sense
of the P&P document) because the 002E had never been used in the context
of the new script before.
A proposal that results in a strong disunification requires a very
strong case in favor.
However, even a proposal that results in a weak disunification still
requires a justification - for example, there are many scripts that use
Western punctuation "as is" and therefore those code points don't
necessarily need to be duplicated.
A./
This archive was generated by hypermail 2.1.5 : Sun Aug 16 2009 - 14:08:54 CDT