UTC/1999-015
Subject: Re: Brief note on length of ideograph descriptions
Asmus,
> 
> What does "eight levels deep" mean precisely?
Good question. If John really means 8 levels deep, this would
be 8 levels of recursion, which
given the prefix notation, would mean no more than 8 IDC's in
a row are interpretable.
> What's the longest sequence that is eight levels deep? If I assume that IDS
> operators can take 2 arguments, the length for a full tree of level 8
> would be
> sum (1 -> 8) 2**n == 2**9-1.
> this comes out to 511 characters. That is a limit, but not a very practical
> one.
Well, actually they can take up to 3 arguments, for the totally
absurd case of stacking repeatedly 3-wise sideways and 3-wise top
and bottom.
For binary operators:
sum (1 -> 8) 2**n = 511.
For ternary operators:
sum (1 -> 8) 3**(n-1) + 3 * 3**7 = 9841
This is getting into the realm of bizarre, indeed.
Practically, I think what John's meaning of Cora's work implies in
no more than 8 IDC's total in a description. An example of something
like this might be U+9EA4. A practical description for the 3 stacked
deer would be:
X
-
@ --> X | X
Where @ is a term further broken down, and each of the X's is a terminal
character, in this case U+9E7F.
This serializes to -X|XX
But each U+9E7F could be broken down to another schema that looks just
the same:
A
-
@ --> B | C
If you combined these, you get:
@ --> A
      -
      @ --> B | C
_
@ --> @ --> A           | @ --> A
            -                   -
            @ --> B | C         B | C
And if you count up the "-"'s and "|"'s you get 8.
This serializes to:  --A|BC|-A|BC-A|BC
The interesting thing about this is that the recursion limit is
actually 4 here.
You could go crazy and pick apart the top half of the deer character
even further, but even then you would recurse no more than 4 deep.
I might be able find some case where you could argue for a 5 deep
recursion, but who would want to?
> 
> I note that we do not provide any limits on the use of combining characters,
> where all of Mark's arguments presumeably apply.
True. But for practical purposes combining more than about 3 is very
rare, and even the oddest cases could be topped out at less than 8,
by my reckoning. Any implementation that chose to have some kind of
limiting behavior at that point would seem perfectly reasonable to me.
> 
> I'm very leery of trying to do a formal limitation, especially as that
> would put us immediately out of ssync with ISO 10646.
No, it wouldn't, any more that putting formal limitations on the
usage of bidi formatting controls does. ISO 10646 says nothing
about this. Unicode would, once again, be giving practical
limits and guidance regarding what is meaningful and what is not
for combining these things.
> 
> I also fail to understand why all of a sudden there is this rush to provide
> perfect support for these characters. In the WG2 meetings the US was always
> forcing China to support the mantra "these are just grpahic characters, not
> controls" and here we are worrying about limits for algorithms that treat
> these as controls.
The magic word is "equivalence". I'm not opposed to setting limits on
the combinatorial significance of these things, as long as there is
no language anywhere that requires a conforming implementation to
do anything more than treat these as blorts.
> 
> BTW, there should also be a rule that no word can be longer than the line
> it's on, so that line breaking is always possible. ;-)
> 
> Most seriously, if this is not about just giving guidance on expected depth
> in normal cases, this should NO LONGER be discussed on this alias.
Why not?
--Ken
> 
> A./
> 
> At 04:51 PM 6/1/99 -0700, John Jenkins wrote:
> >> John,
> >> 
> >> This sounds good and passes the reasonableness test.
> >> 
> >> Are you going to propose language for a position paper
> >> for UTC to discuss on this?
> >>
> >> Or are are you suggesting that we simply add a paragraph
> >> to the book describing our estimate of the recursion
> >> level one might be expected to find in the worst case
> >> (for applications which choose to intepret IDC's as
> >> other than dingbats)?
> >>
> >
> >Mark has pointed out that unless there is a formal limit on the complexity
> >of recursive ideographic description sequences, things could get out of
> >hand.  If you enter a text stream at a random point in the middle of
> >ideographs, you would have to back-track all the way to the beginning of the
> >document before you *know* whether or not you're at the beginning of an
> >ideographic description sequence.
> >
> >Even if you treat the IDC's as visual blobs for rendering purposes, you may
> >choose to take them into account for other operations (cursor movement, say,
> >or line-breaking), and I think that this latter awareness is quite likely.
> >We've already got it on the docket at Apple, for example, to implement.
> >
> >Given that, we should have a formal limitation on the depth of the
> >recursion.  E.g., we should say that an ideographic description sequence
> >more than eight levels deep is invalid.
> >
> >I'm willing to draft language to that effect (in fact, I think I just did)
> >and have the UTC approve it so we can stick it in the book.  Mark can
> >present the issue at the meeting; neither Rick nor I can attend.
> >
> >
> >=====
> >John H. Jenkins
> >jenkins@apple.com
> >tseng@blueneptune.com
> >http://www.blueneptune.com/~tseng
> >
> >
> >
> 
>