From: spir (denis.spir@free.fr)
Date: Wed Feb 03 2010 - 05:19:45 CST
On Tue, 2 Feb 2010 18:53:20 -0700
"Doug Ewell" <doug@ewellic.org> wrote:
> > COMBINING DOT ABOVE followed by LATIN SMALL LETTER D would not be a
> > valid sequence, correct, but you should start working from the d, not
> > the code that follows. After all, the "d" by itself *IS* a valid
> > sequence, whether or not a combining character comes after it. It's
> > the orphaned combining dot that is defective.
>
> There's another problem with spir's original statement. You can't say
> that "a source text holding <0307, 0064> is illegal" because the U+0307
> might not be orphaned at all, but might be preceded by another base
> character. The bracketed text [ėd] consists of the sequence <0065,
> 0307, 0064> and is perfectly legal.
>
> Perhaps spir meant "a source text containing *only* that sequence" or
> "starting with that sequence." This is a nitty detail, but when dealing
> with an inherently stateful concept like combining sequences, nitty
> details matter.
What I meant is: is it legal to encode a "user-perceived character" in really great disorder, eg with a combining mark following what obviously is the base character. In the example, having the <dot above> come first. I interpret your answers meaning no, it's illegal.
The consequence would be that only following characters can be disordered. If codes are already "stacked" (into grouped combining sequences) before normalization, then we can safely ignore "stacks" with less than 3 codes; _and_ start reordering from the 3rd code on. Pseudocode:
foreach stack in stacks do
size = size(stack)
if size < 3 then
next # next stack
end
# kind of bubble sort, but ignoring first code
repeat
no_swap = true
for i=3 to size do # here index base = 1
code1, code2 = stack[i-1], stack[i]
ccc1, ccc2 = getCCC(code1), getCCC(code2)
if ccc1 > ccc2 then
<swap codes>
no_swap = false
end
end
until no_swap
end
So, if "stacks" are built before normalization, only a small proportion of combining sequences *possibly* require reordering (and an even smaller proportion actually are reordered).
Side-question: why are disordered combining sequences even allowed?
Denis
________________________________
la vita e estrany
This archive was generated by hypermail 2.1.5 : Wed Feb 03 2010 - 05:23:46 CST