From: Doug Ewell (dewell@adelphia.net)
Date: Sat Jun 03 2006 - 15:41:57 CDT
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> All precomposed letters necessary for Vietnamese are already encoded,
>> and have been since Unicode 1.1.
>
> Is it true for all Vietnamese letters? I mean here all the
> combinations of one of the 12 Vietnamese vowels (including 5 base
> vowels from the ASCII set, plus a few vowels with a single diacritic
> like the circumblex or a right hook) and one of the 5 tone diacritics
> (marked by combining accents like acute, grace, tilde, macron, dot
> below)?
It is true for all, and also for the barred-D (Đ, đ) which represents
what we think of as a "true" D sound. (The plain letter D is sounded as
"y" or "zh" depending on dialect.)
> Are there additional combinations in Vietnamese? Or is really
> Vietnamese using this small subset of combinations that is easy to
> support in most fonts? I have always assumed that Vietnamese was not
> so much complicate as many people think, despite the apparent
> complexity of Windows-1258 or VISCII.
All letters necessary for Vietnamese are covered; the orthography is not
compromised. Twelve base vowels, five tone marks, 72 vowels total
(including vowels with no tone mark), plus the barred-D and the dong
sign, plus the rest of the Latin alphabet (not all of which is used in
Vietnamese), times 2 for uppercase/lowercase.
I don't think VISCII is complex at all. Everything needed by the
Vietnamese language is encoded in one byte. The only tricky part about
VISCII is that because there are so many letters, it has to encroach
upon the C0 and C1 control-character space, which can cause problems for
rendering engines that assume everything in that space is not visible.
Windows-1258 is a bit more complex, using a combination of precomposed
vowels and combining marks to stay out of the control space.
> What I don't know is if Vietnamese considers the tone marks (encoded
> as diacritical accents) as important at the primary level for the
> language. if it's not so much important, then people can accept to not
> encode the tone marks always, and to the number of characters to
> support in applications like SMS on cell phones is dramatically
> reduced (and text input becomes easy for the 12 phonetic Vietnamese
> base vowels, and tone marks can be optionally entered after those base
> vowels.
Vietnamese is a tonal language, and just as in Chinese or any other
tonal language, two words can have totally different meanings based on
tone. It is up to the writer to decide when it is acceptable to drop
tone marks without causing miscommunication or even offense. Some
writers are more picky about this than others; some make greater demands
on the reader than others. In principle, thugh perhaps not in
frequency, the issue is not much different from dropping accents in
French.
> Why then would it be more complicate to compose text like this,
> instead of using VIQR that would require composing mostly the same
> number of symbols (and sometimes more...)?
Vietnamese composition becomes tricky when working with fully decomposed
vowels, so that ậ decomposes to "U+0061 plus U+0323 plus U+0302." Not
all rendering systems (even today) can handle placing two or more
diacritical marks on a single base letter.
Additionally, this decomposition ("a" plus dot-below plus circumflex)
doesn't match the way Vietnamese view this letter (("a" plus circumflex)
plus dot-below). This is not a Unicode problem, but entering the
diacritics is the language-appropriate order might be a problem if a
rendering engine insists on canonical order.
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sat Jun 03 2006 - 15:54:52 CDT