Re: Tibetan/Burmese/Khmer

From: Norbert Klein (norbert@forum.org.kh)
Date: Tue Jan 21 1997 - 08:45:32 EST

Next message: Yung-Fong Tang: "Re: FW: MS-Windows and Unicode Support"
Previous message: Martin J. Duerst: "Re: private spaces"
Maybe in reply to: Michael Forgey: "Tibetan/Burmese/Khmer"
Next in thread: Francois Yergeau: "Re: Tibetan/Burmese/Khmer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Date: Mon, 20 Jan 1997 09:56:03 -0500 (EST)
> From: Maurice J Bauhahn <bauhahnm@river.it.gvsu.edu>
> Reply-to: Maurice J Bauhahn <bauhahnm@river.it.gvsu.edu>
> To: unicode@Unicode.ORG
> Cc: mduerst@ifi.unizh.ch, Khmer@Unicode.ORG
> Subject: Re: Tibetan/Burmese/Khmer

> Martin, thank you for your thoughtful reply.

[snip]

> A later message from you filled in this sentence, although I am not
> positive what you are saying: possibly you are pointing out that
> the number of glyph combinations to make syllables is not that great
> so everything could be handled by a lookup chart without concern for
> phonetic ordering of Thai.
>
> > > I wish I could calculate the theoretical limits to settle that
> > > question. All I know is the difficulty which I have experienced
> > > in creating a sorting algorythm for the language. There are five
> > > levels of dependencies with up to 35 members in each dependency.
> > > Of course the real language does not have all combinations but
> > > the variations are enough that a simple dictionary lookup does
> > > not seem practical.

I do not know anything about the theoretical limit of glyph numbers
in Khmer, but I know a count in Alain Daniel's dictionary showed that
it contains about 2400 Khmer glyphs - just to show the range
involved.

> >
> > Do those things you call "levels" work similar to the following
> > things in sorting Latin: - Base letters - Accents - Case I.e. you
> > only start to consider accents if two words are completely equal
> > with respect to base letters, or you only start to check out
> > subjoined consonants in comparing two words if the two words are
> > identical with respect to plain consonants?
> >
> Sorting in Khmer is largely based on syllables at least for the
> first four levels (base consonant, first subscript, second
> subscript, vowel). (1) First the base consonant by itself, (2) Then
> the base consonant plus a sign (a rather rare occurrence in the
> past, but with new rules about yukaleapintu and anusvara this will
> be more prevalent) (3) Then the base consonant plus a second base
> consonant, (4) Then a base consonant plus the above mentioned second
> base consonant and a sign (the fifth level). Normally, however the
> sign is on the second consonant (with the two syllable word carrying
> it coming after an identical word without the sign). In this regard
> the sign level seems to be different from the other levels (even
> though it affects the pronunciation of the vowel on the first base
> consonant of the first syllable) (5) This can go rippling off to the
> right with vowels, subscripts and signs on the second or n-th
> consonants in one word (6) Then the base consonant plus a different
> base consonant (cycling through all possible second consonants), (7)
> Then the base consonant plus a vowel (8) Then the base consonant
> plus a first subscript (9) Then the base consonant plus the first
> subscript plus a second subscript. (10) Then the base consonant plus
> the first subscript plus the second subscript plus a vowel....
>
> Fairly recently a committee of the country's leading linguists
> decided that the anusvara and yokaleapintu are only signs and not
> vowels. This has greatly reduced the number of glyph combinations
> which make up vowels (greatly reducing the number of vowels). This
> decision is not yet reflected in the dictionaries or school
> textbooks.
>
> > As one of the authors of RFC 2070, I would be very happy to offer
> > a neat solution. But it's a chicken-and-egg problem. You cannot
> > discuss encoding of a script and already assume an encoding. So
> > please use inline bitmaps, aka GIFs. This is actually suggested in
> > RFC 2070, at the end of section 2.2 :-).
>
> I'm dreaming of a day....
>
> >
> > > > None of this sounds like "root" in the sense in which Tibetan
> > > > uses the term.
> > >
> > > Please post a URL to a document which describes what 'root' does
> > > mean when refering to Tibetan.
> >
> > I'm not an expert in Tibetan, but to give you a very rough idea,
> > take English words like "know", "knife", "psyche",.... Here, "n"
> > or "s" would be the root, not "k" or "p". In Tibetan, consonants
> > before the root can change how the root is pronounced, or maybe
> > may be pronounced themselves in some dialects or in old times.
> > There are grammatical rules to find out which letter is the root,
> > but they are quite complex.
> >
> In Khmer the pronunciation of the consonants is not affected but the
> pronunciation of the vowels is affected by the combinations of
> consonants or signs in its vicinity.
>
> Sincerely,
>
> Maurice Bauhahn
>
>
Norbert Klein system@forum.org.kh
System Operator +855-23-360345
Open Forum Information Exchange P. O. Box 177
Phnom Penh / CAMBODIA

Next message: Yung-Fong Tang: "Re: FW: MS-Windows and Unicode Support"
Previous message: Martin J. Duerst: "Re: private spaces"
Maybe in reply to: Michael Forgey: "Tibetan/Burmese/Khmer"
Next in thread: Francois Yergeau: "Re: Tibetan/Burmese/Khmer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT