On 08/02/2002 03:17:56 PM "Sean B. Palmer" wrote:
>If anyone has any comments on this, or any references to previous
>discussions, they would be gladly recieved.
Any discussion of encoding Latin digraphs as units makes an unvalidated
assumption that there is some benefit to be gained. We've gone for several
decades of English text processing never having encoded English digraphs
(th, ch, ph, wh, ff, gh, tt, ck, ou, ei, ie, ea, ee, oo, oa, etc. and
arguably a...e, e...e, i...e, o...e, u...e as well) as single characters,
and never having felt a need. We have decades of experience dealing with
implementations of Latin script, and less time dealing with
implementations of Indic scripts. But regarding these scripts with which
we have less experience, we encode some complex multi-graphs (especially
representing vowels) in scripts such as Thai as multiple character
sequences never saying there's a problem that needs encoding of digraphs
to obtain a solution. Why is it, then, that for the script for which we
have rather more experience people feel encoding of digraphs is necessary?
(Those are my thoughts, at an rate.)
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>
This archive was generated by hypermail 2.1.2 : Fri Aug 02 2002 - 23:44:18 EDT