Re: Tagging orthographic systems (was: (iso639.186) the

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 13 2000 - 17:04:50 EDT


Tom Emerson wrote:

> One (well, the only) problem I have with explicit orthographic tagging
> is that it makes assumptions that a consistent orthography is being
> used throughout a document, which isn't necessarily the case. This is
> particularly prevalent in East Asian languages:
>
> Japanese verbs will have a standard form, along with several possible
> okurigana variants as well as possible use of hiragana instead of
> kanji. Consider a literal translation of "A hen that lays golden eggs"
> --- 'kin no tomago wa umu niwatori'. There are 24 different ways one
> could write this, all valid.

tamago ;-)

I'm not sure that I would consider this type of variation to
constitute a difference in *orthography* per se. The Japanese
alternation rules for use of kanji versus hiragana forms, as
well as variation in okurigana, are really just permissible
alternations *within* standard Japanese orthography. This kind
of variation means that Japanese spell-checking is very complicated,
but doesn't mean that you could pick out and specify particular
orthographies from the variability involved.

A comparable case for English might be the following:

traveled vs. travelled constitute alternate acceptable spellings
*within* a particular English orthography.

On the other hand, the "-ize" versus "-ise" spelling distinctions
are rather systematic differences between American and British
conventions, and could be considered to represent an orthographic
difference, since there are strongly held opinions regarding
correctness on both sides of the wall, and spellcheckers should
distinguish between them based on expressed preferences.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT