Re: Unicode and transliteration

From: peter_constable@sil.org
Date: Thu Aug 26 1999 - 09:23:13 EDT


>My reading of Unicode is that it does not handle the mapping
       from deep to surface structure; it only does the surface, the
       images used in written language.

       That depends upon what you mean by surface. Unicode is a
       "character" encoding, where character just means an abstract
       unit of textual information. In most cases, Unicode characters
       are defined in terms of abstract orthographic units, sometimes
       called graphemes, but it is considered the exception to define
       a Unicode character in terms of glyphs, which is what most
       people would probably think of as "surface". It is also the
       case that Unicode characters are not normally defined directly
       in terms of actual linguistic units, such as phonemes or
       morphemes.

       In a simple model, there are three distinct layers here:

       - glyphs
       - orthography / graphemes
       - linguistic (which itself has many levels, but for text
       purposes, phonology is the most relevant)

       and Unicode works primarily on the middle layer, though there
       are exceptions.

       Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT