Re: writing Chinese dialects

From: vunzndi@vfemail.net
Date: Tue Feb 06 2007 - 06:57:01 CST

Next message: vunzndi@vfemail.net: "Re: writing Chinese dialects"

Previous message: Philippe Verdy: "Re: writing Chinese dialects"
In reply to: Philippe Verdy: "Re: writing Chinese dialects"
Next in thread: vunzndi@vfemail.net: "Re: writing Chinese dialects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dear Phillipe,

I agree with almost every word you say here, there are in fact two
seperate issues here:-

(1) what form should the "human" enter the ids in
This should be a form which is easy to visualise, most of those
who imput ids are not programmers. A system hard to visualise makes
it difficult for both the inputer and the checkers. Most ids lists
are long, several thousand, often tens of thousands. The key element
is to get the data correct. The the system mentioned here is really
about about the human form.

(2) what form should the data be in for efficient searching
Polished, forward or reversed, is definitely better for this.
Flattening and spliting into parts also helps. As a programmer I
would love it if people could input the information that way
correctly. There are other types of inconsitancies in the way peole
put in data such as using one of two or more characters that look the
same, or a varaitants of each other.

Since the data is input tens of thousands, and maybe even millions of
times, by different people the efficient way is to "compile' , or is
that "recompile" into the form best for searching. This is more work
for the programmer, but with one programmer to dozens of inputters and
many times more end users a fair exchange.

Quoting Philippe Verdy <verdy_p@wanadoo.fr>:

> From: <vunzndi@vfemail.net>
>> PS my congratulations to anyone who can change (a+b/c+d)/(e/f+g)
>> into reverse polish order in a less than five seconds in their head
>
> I do agree that the reverse polish order is not easy to visualize if
> the operator is leading, but if you put the operator at end, it
> gets simpler for many programmers (at least those that use common
> languages like PostScript, or are trained with the assembly language
> and finite state machines with a stack, so yes I can read easily
> this one:
> abc/+d+ef/g+/
> (your expression rewritten with operators after their operands),
> rather than this one:
> /++a/bcd+/efg
> (your expression written with operators before their operands)
>

I agree the first is easier, than the second, but the second is
easier to programme (at least for me)

> I did not want to send critics about your notations which are
> extremely clear; but the main interest of the "Polish" (or reversed
> Polish) notation is that it can be made by simple concatenation of
> its components, so it allows simple substring searches (no need to
> worry about operator priorities and possible parentheses. This may
> be useful in an input editor when looking for matching ideographs
> containing some radicals.
>
> And the - operator proves to be useful when there are missing (still
> unencoded) basic radical (or strokes), and only a composite one is
> encoded.
>
> Some other similar notations could be used to denote the overlapping
> composition of strokes on top of another ideograph, because such
> overlaps are not correctly represented with the current set of IDC
> symbols.
>

I haven't thought out a good way to do this yet, at present I have a
seperate field that is 1 for all parts seperate, 2 for some parts
touching and 3 for some parts overlapping. This designation however
only works well for type 1, it is too vague for types 2 and three. The
question is then how far to take such a process, an d still be ids and
not cdl.

> Also I suggested that the IDS could contain some informational
> diacritics to denote the fact that a basic radical or stroke is
> significantly altered from its base glyph form (notably when a
> ideograph is composed using justapositions like
> surrounding/enclosing: the surrounding or enclosing radical or
> stroke may often be altered to leave space for the central radicals
> or strokes.
>
>

Yes, the IDCs include an overlapping, and enclosing symbols a
touching diacritic, would be useful.

John

-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net

Next message: vunzndi@vfemail.net: "Re: writing Chinese dialects"
Previous message: Philippe Verdy: "Re: writing Chinese dialects"
In reply to: Philippe Verdy: "Re: writing Chinese dialects"
Next in thread: vunzndi@vfemail.net: "Re: writing Chinese dialects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Feb 06 2007 - 07:00:05 CST