From: Hans Aberg (haberg@math.su.se)
Date: Wed Feb 16 2005 - 11:39:19 CST
At 09:20 -0800 2005/02/14, Mark Davis wrote:
> 3. The UTR had for some time recommended the development of data on visually
> confusables, and we will be starting to collect data to test the feasibility
> of different approaches.
One way to handle confusables might be, as opposed of attempting to prohibit
characters, to declare certain groups of characters (or character sequences)
equivalent. Only one name in each equivalence class will accepted. Then, if
somebody tries to define a look-alike name, it will be viewed as already
occupied.
One way to define such equivalences might be to look for another character
set C (which might be the full, or a subset of the Unicode set), and then
map the Unicode characters into the set of finite sequences from C. For
example, homographs will be mapped to the same character. In order to check
whether two Unicode sequence are equivalent, one only has to compute their
mapped C-sequences and see if these are equal. This C-sequence will normally
not be the preferred written one, even if C is the Unicode set: For example,
if both Latin and Greek upper/lower case letters are defined as equivalent,
and further, both Latin/Greek "A" are viewed as homographs, then also lower
case "a" and "alpha" be equivalent, but one will still be able to use the
one form over the other in specific cases.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Wed Feb 16 2005 - 13:00:12 CST