From: Doug Ewell (doug@ewellic.org)
Date: Tue Feb 02 2010 - 19:53:20 CST
Mark E. Shoulson <mark at kli dot org> replied to spir:
>> Also, these definitions seem to imply that a combining sequence
>> cannot be originally defined with the base following a combining
>> mark, eg that a source text holding<U+0307 combining dot above,
>> U+0064 latin small letter d> is simply illegal. Is this true? If
>> yes, a sequence of 2 codes can only be properly ordered and we can
>> safely start reordering from the *third* code.
>
> COMBINING DOT ABOVE followed by LATIN SMALL LETTER D would not be a
> valid sequence, correct, but you should start working from the d, not
> the code that follows. After all, the "d" by itself *IS* a valid
> sequence, whether or not a combining character comes after it. It's
> the orphaned combining dot that is defective.
There's another problem with spir's original statement. You can't say
that "a source text holding <0307, 0064> is illegal" because the U+0307
might not be orphaned at all, but might be preceded by another base
character. The bracketed text [ėd] consists of the sequence <0065,
0307, 0064> and is perfectly legal.
Perhaps spir meant "a source text containing *only* that sequence" or
"starting with that sequence." This is a nitty detail, but when dealing
with an inherently stateful concept like combining sequences, nitty
details matter.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Tue Feb 02 2010 - 19:55:25 CST