From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon May 19 2008 - 02:00:50 CDT
Eric Muller wrote:
> One of the goals of the UDHR in Unicode project is indeed to show (via
> the translations themselves) and to document (via the notes) what
> could be called "best practices for Unicode use".
For that purpose, shouldn’t the data have been checked _before_ making
it public via the “UDHR in Unicode” http://unicode.org/udhr/ pages?
According to http://unicode.org/udhr/index_by_name.html there are only 4
“complete and reviewed” translations, out of 347. And even they use
HYPHEN-MINUS (instead of a dash) in the main heading like “Universal
Declaration of Human Rights - French” and incorrectly leave spaces
around the dash in the boilerplate expression “© 1996 – 2007” below that
heading.
> Most of the texts have been "rescued" from the UN site,
If there’s a point, then, the data should be essentially more correct
than the UN pages. But that doesn’t seem to be the case.
What is the added value when the stated purpose is to demonstrate the
use of Unicode and even “best practices for Unicode use”?
Moreover, I find it highly questionable whether e.g. the declared policy
(see http://unicode.org/udhr/tech_whichcharacter.html ) of using U+2010
HYPHEN is best practice on web pages as of today and near future. It
wins nothing in practice but loses quite a lot when the browser or
associated software (such as a speech synthesizer) cannot handle U+2010
HYPHEN but has no problem with U+002D HYPHEN-MINUS.
The use of U+2019 RIGHT SINGLE QUOTATION MARK as a punctuation
apostrophe or otherwise (when applicable) is reasonably safe to justify
calling it best practice, though somewhat debatably. Yet, this issue isn’t
even mentioned on the ”Which character?” page. Promoting the use of
characters like U+2010 with relatively limited support in fonts simply
isn’t right. For reasonably up-to-date information on font support to
it, consult
http://www.fileformat.info/info/unicode/char/2010/fontsupport.htm
(which lists just a set of Lucida fonts; there are some additional, less
common fonts that contain it).
In many situations, a browser will use a glyph for a character from a
different font when none exists in the primary font. While this is often
useful e.g. for technical symbols, it easily leads to confusion,
especially for characters like hyphens, dashes, and relatives. For them,
length is essential, and most (though not all) fonts have reasonable
widths assigned to them _relative to each other_. For example, the
HYPHEN-DASH is shorter than EN DASH, in most fonts. But when you mix
such characters from different fonts, such relationships are often lost.
Somewhat similarly, the varying apostrophe-like characters might be
reasonably implemented in a given font, but a mixture of fonts may mess
this up.
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon May 19 2008 - 02:05:32 CDT