From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jan 26 2007 - 15:18:38 CST
Asmus wrote:
> The rules for the use of long s, and for ligatures (in German), both
> require that you know the word boundaries inside a compound word. As has
> been demonstrated on this list many times, there are cases where even
> dictionary-based approaches must fail, ...
> There's no debate that the amount of text intervention would be
> considerable, that there are definite limits to what you can do (or
> assist the user with) by software, and that doing even that would
> require considerable modifications/adjustments to existing architectures
> and dictionary data.
Short summary:
German text is in the Latin script, whether represented using
Antiqua fonts or Blackletter fonts, and is encoded as such,
using sc=Latn Unicode characters.
German text represented in "Fraktur" is using a different
*writing system* than German text represented in "Roman".
No one in their right minds assumes that text in one writing
system can be automatically converted to another writing system
without the heavy intervention of a lot of rather
complex software (spell-checkers and dictionaries, contextual
analysis, specialized linebreak and hyphenation rules)
*and* in most cases a moderate to significant
amount of editorial intervention for the hard cases.
What would be bizarre would be to assume that German in
Fraktur could be converted to German in Roman by simply
making a font change and assuming that no adjustments would
be necessary to underlying character representation when
converting from one writing system to another.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 15:20:40 CST