Philippe Verdy <verdy_p_at_wanadoo.fr> writes:
> In all cases, you need knowledge of the language before trying to
> implement a word-breaker for that language. The solution in UAX#29
> will still provide some basic breaks to reduce the number of cases and
> to more easily detect exceptions, and it can be a good first
> processing step used in actually working word breakers for spell
> checkers, grammatical analysis, and automated translators, and for
> disambiguating leading and trailing apostrophes from leading and
> trailing quotation marks.
I’m off on a tangent here, but I can add that when my native Swedish
uses single quotation marks it traditionally uses RIGHT SINGLE QUOTATION
MARK both before and after a quotation. This is yet another example of
how you need knowledge of the language, and yet another example of how
hard word-breaking can be.
Then if you read
Jag såg ’na när hon spela’ piano.
= Jag såg henne när hon spelade piano.
= I saw her when she played the piano.
no simple algorithm would know that there isn’t a quote “na när hon
spela†in there.
Received on Tue Jul 05 2011 - 09:06:16 CDT
This archive was generated by hypermail 2.2.0 : Tue Jul 05 2011 - 09:06:18 CDT