Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jan 26 2007 - 15:18:38 CST

  • Next message: Asmus Freytag: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

    Asmus wrote:

    > The rules for the use of long s, and for ligatures (in German), both
    > require that you know the word boundaries inside a compound word. As has
    > been demonstrated on this list many times, there are cases where even
    > dictionary-based approaches must fail, ...

    > There's no debate that the amount of text intervention would be
    > considerable, that there are definite limits to what you can do (or
    > assist the user with) by software, and that doing even that would
    > require considerable modifications/adjustments to existing architectures
    > and dictionary data.

    Short summary:

    German text is in the Latin script, whether represented using
    Antiqua fonts or Blackletter fonts, and is encoded as such,
    using sc=Latn Unicode characters.

    German text represented in "Fraktur" is using a different
    *writing system* than German text represented in "Roman".

    No one in their right minds assumes that text in one writing
    system can be automatically converted to another writing system
    without the heavy intervention of a lot of rather
    complex software (spell-checkers and dictionaries, contextual
    analysis, specialized linebreak and hyphenation rules)
    *and* in most cases a moderate to significant
    amount of editorial intervention for the hard cases.
    What would be bizarre would be to assume that German in
    Fraktur could be converted to German in Roman by simply
    making a font change and assuming that no adjustments would
    be necessary to underlying character representation when
    converting from one writing system to another.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 15:20:40 CST