From: Peter Constable (petercon@microsoft.com)
Date: Mon May 24 2004 - 21:10:12 CDT
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> Sent: Monday, May 24, 2004 3:28 PM
> Is it a joke? UTF-8 designates Unicode codepoints refering to
> Unicode abstract characters with all their semantic (including
> the character name and properties).
No, it is not a tweak. For years, many scholars working with electronic
versions of Biblical texts have used the MCW (not MCS -- a typo on my
part) representation, which is effectively a Latin cipher of Hebrew and
Greek characters. The abstract characters are entirely Basic Latin
characters, but they are standing for Hebrew or Greek characters.
> You can't say that the tableabove is ASCII not either Unicode.
> It's only a separate legacy 7-bit encoding.
It certainly could be considered ASCII or Unicode Basic Latin
characters: they are always documented as such, and viewed as such. One
*could* also consider it a legacy encoding of non-Latin characters, but
in practice it's not used that way -- it's only at a higher level of
interpretation (on the part of the user, not the system) that these are
Hebrew or Greek characters.
> which is probably
> not widely interoperable because unimplemented or not documented
> in the same common places as where ASCII and Unicode are defined.
Well, actually, it *is* interoperable within the sizeable community that
has adopted that convention -- they can and do interchange data using
this. You can find content using this representation in such places as
the Oxford Text Archive.
Peter Constable
This archive was generated by hypermail 2.1.5 : Mon May 24 2004 - 21:10:52 CDT