Asmus,
This discussion reminds me of my ill fated efforts to produce a manageable
set of rules to do automatic title casing starting with French text. It
would have required either special dictionaries or entering the text in a
special way. If special text was used, one could enter it in the proper
title case to begin with.
If you are entering Danish city names then enter it as �lborg. You should
only use Aalborg where the font does not support �. For matching logic you
can equate � to Aa then the issue of compound words goes away.
Carl
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]On
> Behalf Of Asmus Freytag
> Sent: Saturday, September 08, 2001 5:56 PM
> To: Mark Davis; [email protected]; Francesco Zappa Nardelli
> Subject: Re: [OT] o-circumflex
>
>
> At 02:45 PM 9/8/01 -0700, Mark Davis wrote:
> >If you use a Danish tailoring of the UCA that equates Å and AA
> (at least at
> >a primary and secondary level), then they will sort the same
> way. A string
> >search that uses the same tailoring will also find "Ålborg" when given
> >"Aalborg" (and vice versa).
>
> But if you do this, all compound words starting with "data" and
> continuing
> with another word starting with "a" will be sorted incorrectly!
>
> To achieve this effect, you would have to mark which AAs are A-Rings and
> which ones are accidental adjacencies. In Danish one can use the
> SHY (soft
> hyphen) to break the latter, as these accidental pairs occur at
> legal word
> break points. In fact, that's the recommended solution, but it requires
> that the input data are in a sepecific form.
>
> A./
>
This archive was generated by hypermail 2.1.2 : Sat Sep 08 2001 - 22:36:54 EDT