From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Aug 04 2010 - 12:56:32 CDT
On 8/2/2010 5:04 PM, Karl Pentzlin wrote:
> I have compiled a draft proposal:
> Proposal to add Variation Sequences for Latin and Cyrillic letters
> The draft can be downloaded at:
>  http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB).
> The final proposal is intended to be submitted for the next UTC
> starting next Monday (August 9).
>
> Any comments are welcome.
>
> - Karl Pentzlin
>
>   
This is an interesting proposal to deal with the glyph selection problem 
caused by the unification process inherent in character encoding.
When Unicode was first contemplated, the web did not exist and the 
expectation was that it would nearly always be possible to specify the 
font to be used for a given text and that selecting a font would give 
the correct glyph.
As the proposal noted, universal fonts and viewing documents on other 
platforms and systems across the web have made this solution 
unattractive for general texts.
We are left then with these five scenarios
1) Free variation
2) Orthographic variation of isolated characters (by language, e.g. 
different capitals)
3) Orthographic variation of entire texts (e.g. italic Cyrillic forms, 
by language)
4) Orthographic variation by type style (e.g. Fraktur conventions)
5) Notational conventions (e.g. IPA)
For free variation of a glyph, the only possible solutions are either 
font selection or use of a variation sequence. I concur with Karl, that 
in this case, where notable variations have been unified, that adding 
variation selectors is a much more viable means of controlling authorial 
intent than font selection.
If text is language tagged, then Opentype mechanisms exist  in principle 
to handle scenario 2 and 3. For full texts in a certain language, using 
variation selectors throughout is unappealing as a solution.
However, it may be a viable solution for being able to embed correctly 
rendered citations in other text, given that language tagging can be 
separated from the document and that automatic language tagging may 
detect large chunks of text, but not short runs.
The Fraktur problem is one where one typestyle requires additional 
information (e.g. when to select long s) that is not required for 
rendering the same text in another typestyle. If it is indeed desirable 
(and possible) to create a correctly encoded string that can be rendered 
without further change automatically in both typestyles, then adding any 
necessary variation sequences to ensure that ability might be useful. 
However, that needs to be addressed in the context of a precise 
specification of how to encode texts so that they are dual renderable. 
Only addressing some isolated variation sequences makes no sense.
Notational conventions are addressed in Unicode by duplicate encoding 
(IPA) or by variation sequences. The scheme has holes, in that it is not 
possible in a few cases to select one of the variants explicitly, 
instead, the ambiguous form has to be used, in the hope that a font is 
used that will have the proper variant in place for the ambiguous form.
Adding a few variation sequences (like the one to allow the "a" at 0061 
to be the two story one needed for IPA) would fill the gap for times 
when controlling the precise display font is not available.
However, there's no need to add variation sequences to select an 
*ambiguous* form. Those sequences should be removed from the proposal.
Overall a valuable starting point for a necessary discussion.
A./
This archive was generated by hypermail 2.1.5 : Wed Aug 04 2010 - 13:00:40 CDT