From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Dec 15 2009 - 11:00:57 CST
On 12/15/2009 5:28 AM, Doug Ewell wrote:
> Jeroen Ruigrok van der Werven <asmodai at in dash nomine dot org> wrote:
>
>> Actually ij is unbreakable from a language point of view. You cannot
>> hyphenate any words using it like blijdschap into bli-jdschap.
>
> I'm not sure this particular argument proves what you want it to
> prove. In English you cannot insert a hyphen between the T and H in
> "bother," or between the S and H in "fishing." But that says nothing
> about whether the two characters are considered a single letter in
> English, or whether they should or must be written as a ligature.
> Your other arguments are more convincing.
I think it serves as a sort of sufficient condition: if you can insert a
hyphen, then the thing is not unbreakable. (But the implication doesn't
work in the opposite case).
Whether a single entity in a writing system gets encoded as a singleton
or as a code sequence is initially a matter of choice. A sequences is,
in principle, just as good a representation of an entity as a single
code value (but, from a practical point of view, may require
more/different support in an implementation). The real issue comes when
you look at what the elements of the sequence encode by themselves.
If Unicode had encoded "left half of ligature oe" and "right half of
ligature oe" then these two code points in sequence would be
distinguishable from the sequence of "o" and "e", even though the
ligature-derived entity is not coded with a single code value (in this
hypothetical example).
If a letter pair has special behavior in a particular language, then you
have the choice of putting the burden of identifying that pair on to the
user or the implementation. If a pair is absolutely consistently treated
as a pair, then asking the user to identify it as such is unnecessary
(think of the lam-alif ligature in Arabic).
Otherwise, if the implementation can't correctly identify when a pair is
special, you have no choice but to give the user a means to identify it.
If the distinction is orthographic, it belongs in the encoding,
otherwise it could live in meta-data.
The problem comes in situations that aren't pure. Use or not use of
ligation for a text is a stylistic choice, but use of ligation for
specific words can be prohibited in ways that are orthographic (and not
deterministic).
A./
This archive was generated by hypermail 2.1.5 : Tue Dec 15 2009 - 11:03:02 CST