From: Joan_Wardell@sil.org
Date: Wed Jul 30 2003 - 11:57:57 EDT
Ted,
I agree 100% with your description of the characters that have not been
encoded in Unicode. There are certainly marks and consonants that mean two
completely different things, as you have so accurately described. But there
are two approaches to encoding. There is "Code what you see" and "Code what
is meant". In your analysis and in the way SIL encoded the original SIL
Ezra font, we went with "Code what is meant". This means that we have two
shevas (one pronounced and one silent), a holemwaw character and a shureq
character. Unicode, on the other hand, is totally "Code what you see". It
is attempting to make no analysis of the marks on the page. If there is a
mark, code it. If it is identical to another mark, then it gets the same
codepoint. (Of course, there are exceptions, but this is the general rule.)
So with Unicode, there is no way to separate even vowels and consonants,
since a waw in a shureq, a holem-waw, and just a plain waw will always be
encoded the same. Some of us are trying to make this approach usable by
allowing at least a holem-waw to be distinguished from waw holem, by
placing the holem first.
For the encoders, it is fairly straight-forward. For the people trying to
actually use the encoding, it's going to take a lot of context to determine
what you've got.
Joan Wardell
NRSI-SIL
This archive was generated by hypermail 2.1.5 : Wed Jul 30 2003 - 13:25:11 EDT