Re: Latin w/ diacritics (was Re: benefits of unicode)

From: James Kass (jameskass@worldnet.att.net)
Date: Wed Apr 18 2001 - 05:07:24 EDT


Peter Constable wrote:

>
> >..., the old 386's
> >... may not be able
> >to support an OS capable of using new rendering technology.
>
> That is indeed a problem. It's not one that technologists are good at
> solving, if for no other reason than because they have little option but to
> develop for collective newer technology. E.g. revamping Win3.x code to
> provide support for Unicode and for smart-font rendering that can run on a
> 386 with 4MB RAM wouldn't exactly be enjoyable work, even if such a project
> could be given resources.
>

Indeed. And it wouldn't be fair to fault businesses reluctant to
invest millions of dollars to target an impoverished market.

> There is also an issue of practical feasibility, though: smart-font
> rendering technologies are not fast. They depend on fast CPUs to give
> adequate performance. Running Uniscribe/OT or Graphite on a 25MHz 386SX
> probably wouldn't be pleasant for the user.
>

It might be faster for the user to go out and buy a new
computer rather than waiting for her screen to form...

>
> But I wonder if they won't get better results and sooner as spillover from
> advances in globalised software based on Unicode and smart fonts. If the
> next version of Uniscribe turns on OT shaping for Latin so that stacking
> diacritics can be supported, then that will probably work in IE and in
> Office XP. Building apps is very difficult and likely beyond the reach of
> most in Sudan. Building OT fonts and input methods isn't easy, but is
> attainable by more. If apps and OSes are written to be generally friendly
> to the world's scripts, then people can build fonts and IMs and start
> working with their less-well known writing systems using the same
> commercial-grade applications that those in commercially-viable markets get
> to enjoy. (There is still the problem of users having only older equipment,
> though.)
>

Software like FAIRY for HTML, and discussions like the recent post
from William Overington about providing Unicode access to users of
older systems offer some hope. Your point about friendly new apps
is well taken. As the effects from the new technologies "trickle-down"
to the less fortunate, these issues that concern us now will fade away.

> >...Private Use Area of Unicode as one alternative. Existing hardware
> >and software already provide some support for PUA characters.
>
> You'll have to wait rather longer to see Uniscribe provide rendering
> support for PUA characters than for Latin & diacritics.
>

Pre-composed Latin characters in the PUA don't require
any special rendering support, they'd be rendered the same
as any precomposed BMP Latin character. They would need
custom input support, though, and there are issues with PUA
display in many applications. If you placed a script like
Pahawh Hmong in the PUA and made an HTML file using
the PUA encoding, you might be astonished to see how poorly
the page is displayed. Inappropriate line breaks are one problem.
 
>
> >... perhaps...PUA registry...along the same lines as ...ConScript
>
> We will likely do something like that within SIL and some partner
> organisations. This has been discussed in relation to OLAC (Open Language
> Archive Community as well). I think OLAC is the most appropriate forum for
> this.
>

Thank you.

>
> >...custom (8-bit) code pages...
>
> I've done it numerous times, and I still do it on occasion. I still call it
> a "hack", though, since that's what it is, in many cases at least: The cmap
> in TrueType fonts for Windows uses Unicode. People think they're putting
> their favourite character on an 8-bit codepoint, but in the font they are
> actually hacking with Unicode, breaking conformance rules C6 and especially
> C7.
>
>

The 'cmap' in TrueType fonts for Windows uses double-byte encoding.
(Windows NT supports the new specs which allow multi-byte.)
Unicode is one multi-byte standard, but there are others. Custom
code pages predate both Unicode and Windows. (I recently came
across an archive of old DOS fonts which included a font for the
Voynich Manuscript!) "Hack" has a negative connotation, even if
it's used to refer to one's own activity. Perhaps we could be more
respectful to people who are only doing the best they can with the
tools they have available (and may not even know rules exist).
Consider the following linked page:

http://www.linguistics.unimelb.edu.au/research/hmong/hmongaustpahawh.html#pahawh

If you want to view the page properly, you'll download their font.
There is no alternative for web master or visitor.
(I know about PDFs and GIFs; they aren't HTML.)

The custom TrueType fonts for Windows are simply legitimate
descendants of older custom fonts in other formats. In many cases,
non-Latin letters are mapped to ASCII positions in order to allow
basic keyboard entry in the quickest possible fashion. Users of
non-Latin scripts need an "A" at decimal value 65 in their fonts
just as much as they would an "A" on their keyboards.

> And if the designers of the pyramids hadn't told the people in the
> quarries, "The blocks shall have exactly these dimensions..."?
>

What would have happened? (I'm sorry if the punch line is obvious to
everyone else...)

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT