From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Aug 04 2010 - 12:56:32 CDT
On 8/2/2010 5:04 PM, Karl Pentzlin wrote:
> I have compiled a draft proposal:
> Proposal to add Variation Sequences for Latin and Cyrillic letters
> The draft can be downloaded at:
> http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB).
> The final proposal is intended to be submitted for the next UTC
> starting next Monday (August 9).
>
> Any comments are welcome.
>
> - Karl Pentzlin
>
>
This is an interesting proposal to deal with the glyph selection problem
caused by the unification process inherent in character encoding.
When Unicode was first contemplated, the web did not exist and the
expectation was that it would nearly always be possible to specify the
font to be used for a given text and that selecting a font would give
the correct glyph.
As the proposal noted, universal fonts and viewing documents on other
platforms and systems across the web have made this solution
unattractive for general texts.
We are left then with these five scenarios
1) Free variation
2) Orthographic variation of isolated characters (by language, e.g.
different capitals)
3) Orthographic variation of entire texts (e.g. italic Cyrillic forms,
by language)
4) Orthographic variation by type style (e.g. Fraktur conventions)
5) Notational conventions (e.g. IPA)
For free variation of a glyph, the only possible solutions are either
font selection or use of a variation sequence. I concur with Karl, that
in this case, where notable variations have been unified, that adding
variation selectors is a much more viable means of controlling authorial
intent than font selection.
If text is language tagged, then Opentype mechanisms exist in principle
to handle scenario 2 and 3. For full texts in a certain language, using
variation selectors throughout is unappealing as a solution.
However, it may be a viable solution for being able to embed correctly
rendered citations in other text, given that language tagging can be
separated from the document and that automatic language tagging may
detect large chunks of text, but not short runs.
The Fraktur problem is one where one typestyle requires additional
information (e.g. when to select long s) that is not required for
rendering the same text in another typestyle. If it is indeed desirable
(and possible) to create a correctly encoded string that can be rendered
without further change automatically in both typestyles, then adding any
necessary variation sequences to ensure that ability might be useful.
However, that needs to be addressed in the context of a precise
specification of how to encode texts so that they are dual renderable.
Only addressing some isolated variation sequences makes no sense.
Notational conventions are addressed in Unicode by duplicate encoding
(IPA) or by variation sequences. The scheme has holes, in that it is not
possible in a few cases to select one of the variants explicitly,
instead, the ambiguous form has to be used, in the hope that a font is
used that will have the proper variant in place for the ambiguous form.
Adding a few variation sequences (like the one to allow the "a" at 0061
to be the two story one needed for IPA) would fill the gap for times
when controlling the precise display font is not available.
However, there's no need to add variation sequences to select an
*ambiguous* form. Those sequences should be removed from the proposal.
Overall a valuable starting point for a necessary discussion.
A./
This archive was generated by hypermail 2.1.5 : Wed Aug 04 2010 - 13:00:40 CDT