Re: writing Chinese dialects

From: vunzndi@vfemail.net
Date: Tue Feb 06 2007 - 06:57:01 CST

  • Next message: vunzndi@vfemail.net: "Re: writing Chinese dialects"

    Dear Phillipe,

    I agree with almost every word you say here, there are in fact two
    seperate issues here:-

    (1) what form should the "human" enter the ids in
           This should be a form which is easy to visualise, most of those
    who imput ids are not programmers. A system hard to visualise makes
    it difficult for both the inputer and the checkers. Most ids lists
    are long, several thousand, often tens of thousands. The key element
    is to get the data correct. The the system mentioned here is really
    about about the human form.

    (2) what form should the data be in for efficient searching
         Polished, forward or reversed, is definitely better for this.
    Flattening and spliting into parts also helps. As a programmer I
    would love it if people could input the information that way
    correctly. There are other types of inconsitancies in the way peole
    put in data such as using one of two or more characters that look the
    same, or a varaitants of each other.

    Since the data is input tens of thousands, and maybe even millions of
    times, by different people the efficient way is to "compile' , or is
    that "recompile" into the form best for searching. This is more work
    for the programmer, but with one programmer to dozens of inputters and
    many times more end users a fair exchange.

    Quoting Philippe Verdy <verdy_p@wanadoo.fr>:

    > From: <vunzndi@vfemail.net>
    >> PS my congratulations to anyone who can change (a+b/c+d)/(e/f+g)
    >> into reverse polish order in a less than five seconds in their head
    >
    > I do agree that the reverse polish order is not easy to visualize if
    > the operator is leading, but if you put the operator at end, it
    > gets simpler for many programmers (at least those that use common
    > languages like PostScript, or are trained with the assembly language
    > and finite state machines with a stack, so yes I can read easily
    > this one:
    > abc/+d+ef/g+/
    > (your expression rewritten with operators after their operands),
    > rather than this one:
    > /++a/bcd+/efg
    > (your expression written with operators before their operands)
    >

    I agree the first is easier, than the second, but the second is
    easier to programme (at least for me)

    > I did not want to send critics about your notations which are
    > extremely clear; but the main interest of the "Polish" (or reversed
    > Polish) notation is that it can be made by simple concatenation of
    > its components, so it allows simple substring searches (no need to
    > worry about operator priorities and possible parentheses. This may
    > be useful in an input editor when looking for matching ideographs
    > containing some radicals.
    >
    > And the - operator proves to be useful when there are missing (still
    > unencoded) basic radical (or strokes), and only a composite one is
    > encoded.
    >
    > Some other similar notations could be used to denote the overlapping
    > composition of strokes on top of another ideograph, because such
    > overlaps are not correctly represented with the current set of IDC
    > symbols.
    >

    I haven't thought out a good way to do this yet, at present I have a
    seperate field that is 1 for all parts seperate, 2 for some parts
    touching and 3 for some parts overlapping. This designation however
    only works well for type 1, it is too vague for types 2 and three. The
    question is then how far to take such a process, an d still be ids and
    not cdl.

    > Also I suggested that the IDS could contain some informational
    > diacritics to denote the fact that a basic radical or stroke is
    > significantly altered from its base glyph form (notably when a
    > ideograph is composed using justapositions like
    > surrounding/enclosing: the surrounding or enclosing radical or
    > stroke may often be altered to leave space for the central radicals
    > or strokes.
    >
    >

    Yes, the IDCs include an overlapping, and enclosing symbols a
    touching diacritic, would be useful.

    John

    -------------------------------------------------
    This message sent through Virus Free Email
    http://www.vfemail.net



    This archive was generated by hypermail 2.1.5 : Tue Feb 06 2007 - 07:00:05 CST