From: Peter Kirk (peterkirk@qaya.org)
Date: Sun Jul 03 2005 - 13:02:20 CDT
On 03/07/2005 13:06, David Perry wrote:
> ...
>
>More important, these characters should be used with great caution, if at
>all. Consider the fact that if a user searches a document for the word
>"biblion" and types a standard beta at both positions, he won't find the
>word if it is encoded using the "curly" beta in the middle. ...
>
But he or she would find the word correctly if the search is based on
the Unicode Collation Algorithm - and if the two betas are collated as
the same at the top level, as they certainly should be (but I haven't
checked this). So the issue with using a different character is not as
serious as you suggest.
The issue is of course very similar to that of final form sigma, which
is already clearly encoded as a separate character (although it could
have been encoded as a positional variant), and is in fact sometimes
used in the middle of a word e.g. at the end of a prefix (I have
certainly seen this usage in 19th century works). The correct approach
here is for searches to treat the two forms of sigma as equivalent,
rather than expect the user to choose the correct form in the search
box. And the same should happen for the two forms of beta, and of some
other letters.
...
>The best way to get alternate letter shapes is to use advanced font
>technologies such as AAT or OpenType that allow the display of alternate
>glyphs without modifying the underlying Unicode values.
>I know that support for these technologies is still limited, but it is
>improving and will probably be more widespread with the next release of
>Windows, which will support OT features at the system level. ...
>
An alternative which could be considered is to encode the special
variant form as a variation sequence. Such sequences are already
supported by OpenType fonts in an application-independent way. And the
searching problem disappears if the variation selector is ignored as it
should be in all searches. But I doubt if the UTC would accept a
variation sequence for something which is already encoded as a character.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.323 / Virus Database: 267.8.8/37 - Release Date: 01/07/2005
This archive was generated by hypermail 2.1.5 : Sun Jul 03 2005 - 13:07:54 CDT