>My reading of Unicode is that it does not handle the mapping
from deep to surface structure; it only does the surface, the
images used in written language.
That depends upon what you mean by surface. Unicode is a
"character" encoding, where character just means an abstract
unit of textual information. In most cases, Unicode characters
are defined in terms of abstract orthographic units, sometimes
called graphemes, but it is considered the exception to define
a Unicode character in terms of glyphs, which is what most
people would probably think of as "surface". It is also the
case that Unicode characters are not normally defined directly
in terms of actual linguistic units, such as phonemes or
morphemes.
In a simple model, there are three distinct layers here:
- glyphs
- orthography / graphemes
- linguistic (which itself has many levels, but for text
purposes, phonology is the most relevant)
and Unicode works primarily on the middle layer, though there
are exceptions.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT