I've got a question about one particular aspect of encoding
polytonic Greek, specifically having to do with word-initial
captial (not all caps) vowels with accents and/or breathing
marks:
Let suppose that we're encoding the text using the Greek
Extended block at U+1Fnn in addition to the Greek block at
U+03nn. I might encode a capital alpha with psili as U+1F08.
Since the psili is often written to the left the capital alpha,
though, this could be encoded as U+1FBF U+0391. Both
alternatives are possible for any of the vowel/accent/breathing
combinations. I've seen some sample text that uses the latter
approach (there was a posting in September 1998 in which
someone was looking for sample text in Ancient Greek, and one
responder provided a URL: http:
//titus.uni-frankfurt.de/unicode/samples/grbeisp.htm).
These two solutions have a very important difference, however:
- U+1F08 has a canonical decomposition of 0391 0313.
- U+1FBF U+0391 has a canonical decomposition of 1FBF 0391 and
a compatibility decomposition of 0020 0313 0391.
Taking another example pair
- U+1F0C (capital alpha w/ psili and oxia) has a canonical
decomposition of 0391 0313 0301
- U+1FCE U+0391 has a canonical decomposition of 1FBF 0301 0391
which has a compatibility decomposition of 0020 0313 0301 0391.
Q: Which approach should be considered preferable?
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT