RE: Furigana

From: Murray Sargent (murrays@exchange.microsoft.com)
Date: Tue Aug 13 2002 - 17:15:58 EDT


As Ken says the Unicode interlinear annotation characters are for
internal use only. Specifically, their meanings can be different for
different programs. If you have your nice marked up text in memory and
want to export it for use by some program, you need to use a
higher-level protocol that translates the interlinear annotation
characters to a standardized external format, such as HTML. In addition
to U+FFF9 - U+FFFB, there are other characters for internal use only,
namely U+FDD0 - U+FDEF. The meanings of these characters also can (and
do) differ for different programs. Originally it was hoped that the
interlinear annotation characters might be able to describe ruby
adequately, but it became clear that additional information is necessary
to express ruby unambiguously. Hence the UTC adopted them for internal
use only, with associated information presumably stored elsewhere to
resolve the ambiguities.

Frankly IMHO the best thing for a program to do with reading such
characters is to delete them. This isn't quite what one might think from
the Standard since they unfortunately aren't labeled as noncharacters.
But if a program uses them internally with a well defined meaning,
getting them in from an external source can violate the internal usage.
To actually roundtrip these "rogue" characters would require some extra
internal protocol to ignore them when they've been read in. So my edit
engine (RichEdit), which uses them for table row delimiters, simply
deletes them on input and only exports them for RichEdit-specific
contexts.

Murray

-----Original Message-----
From: Michael Everson [mailto:everson@evertype.com]
Sent: Tuesday, August 13, 2002 7:52 AM
To: unicode@unicode.org
Cc: Ken Whistler
Subject: Re: Furigana

At 12:11 -0700 2002-08-08, Kenneth Whistler wrote:

>Ah, but read the caveats carefully. The Unicode interlinear annotation
>characters are *not* intended for interchange, unlike the HTML4 <ruby>
>tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor
>points.

What does this mean? That if I have a text all nice and marked up
with furigana in Quark I can't export it to Word and reimport it in
InDesign and expect my nice marked up text to still be marked up?

Surely all Unicode/10646 characters are expected to be preserved in
interchange. What have I got wrong, Ken?

-- 
Michael Everson *** Everson Typography *** http://www.evertype.com



This archive was generated by hypermail 2.1.2 : Tue Aug 13 2002 - 15:30:13 EDT