Re: FW: New version of TR29:

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Aug 20 2002 - 07:39:38 EDT


Marco Cimarosti scripsit:

> The issue is making the error window as narrow as possible. My assumption is
> that is common words such as "c'", "d'", "j'", "l'", "n'", "qu'", "s'", "t'"
> or "v'" are more common than edge cases like "prud'homme".

How about this heuristic:

Break after an apostrophe that is the second or third letter in the
word. Do not break after apostrophes that come later. This neatly
handles (I think) all the English, Italian, and Esperanto cases, and
a good many of the French ones.

-- 
John Cowan  jcowan@reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
There is also one Dummett.  By their works shall ye know them.  However, just as
no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
known by his works.  Indeed, Bummett does not exist.  It is part of the function
of this and other e-mail messages, therefore, to do what they can to create him.



This archive was generated by hypermail 2.1.2 : Tue Aug 20 2002 - 06:10:55 EDT