From: D. Starner (shalesller@writeme.com)
Date: Sun Dec 28 2003 - 23:47:31 EST
> As to harm, where's the harm in encoding Japanese kanzi separately, or
> Latin uncial, or a complete set of small capitals as a third case?
> Where's the harm in encoding Latin Renaissance scripts separately?
Spell checking, for one. Should you use T-cedilla or T-comma for Romanian?
What if your keyboard emits one and your spellchecker accepts the other?
(I guess T-comma is the correct answer, but there's a lot of Latin-2
data and old keyboards running around that use T-cedilla.) An Irish
spellchecker should work whether you use unical or antigua fonts.
Japanese kanzi is a slightly different matter, but the seperate encoding
of over ten thousand characters is a problem in itself.
> But should a difference in appearance count in a decision to code
> separately within Unicode when *every* other feature of two "scripts" is
> identical, including origin?
Intra-script, a difference in appearance has call for seperate codings.
Inter-script, if the appearance is dissimilar enough to be a bar to
reading, and there's a disjoint population of users (so that one is
not a handwriting or cipher variant of another), there is reason to
encode a seperate script.
> Emerson's division
> would suggest four different scripts ought to be used for coding the
> same texts with the same logical characters with the same names,
Yes. Look at Serbo-Croat; there are the same texts with the same
logical characters, one in Latin and one in Cyrillic. I'd be
surprised to find that the only case; I would assume some of the
Turkic languages that switched from Cyrillic to Latin did so by
changing glyphs instead of any deeper script features.
Indeed, by the same argument, we could encode a lot of scripts
together. ISCII did it for Indic scripts. I'm sure we could do
some serious merging among syllabic scripts - 12A8(ከ) is the same
as 13A7(Ꭷ) and 1472(ᑲ) with different glyphs - and among alphabetic
scripts, and even in alphabetic scripts - I mean, 015D(ŝ) is
basically the same as 015F(ş) and 0283(ʃ), aren't they?
(One just-for-fun idea that's been bouncing around in my head is
a universal character set that encodes something closer to the
underlying phonemic characters and applies orthography selectors.
English, unfortunately, moves from a language that can be supported
on the most ancient bitty-box to a language that takes serious
work to get right under this system.)
> There may also be some thinking of HTML/XML/XHTML web display of
> characters where forcing of font is not reliable. One would not want a
> discussion of ancient Phoenician characters to display modern Hebrew
> forms! But this same problem currently applies to runes, medieval Latin
> characters, Han characters and so forth. One shouldn't let the current
> shortcomings of one display method among many dictate Unicode encodings.
One display method? Of the common document types:
PDF and Postscript embed fonts and don't have this problem, but aren't
editable.
A Word document doesn't embed fonts (usually?), and neither do OpenOffice,
RTF, HTML, XML, and most other word-processing formats or data
exchange formats. So font choice is not reliable in these formats.
A plain text document can't embed fonts or even programmatically suggest
a font.
As for Phoenician, perhaps a scholar may be happy with it as a font variant
of Hebrew, but I don't see why it's not equally a font variant of Greek. No
non-scholarly user (and Phoenician may well have a few) will understand why
Phoenician is considered Hebrew, because they don't look alike.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Mon Dec 29 2003 - 00:37:46 EST