[unicode] Re: Malay (Latin) characters in Unicode?

From: dvdeug@hushmail.com
Date: Fri Mar 23 2001 - 14:57:47 EST


At Fri, 23 Mar 2001 00:13:33 -0800, Rick McGowan <rick@unicode.org>
wrote:
>David Starner wrote:
>
>> I have a copy of Shellbear's Practical Malay Grammar that I'm preparing
>> to transcribe for Project Gutenberg. Unfortunately, he represents
>>the
>> Malaysian alphabet in a Latin transliteration that includes ng as
>>a
>> single ligatured form, and I don't know how to transcribe in Unicode.
>
>Could you perhaps post or point to a picture of what it looks like?
> I
>suppose it's an "N" with a loopy tail of some type.

More like rg. A picture is attached. (Was attached. Rick probably has a
copy,
but it seems to have got lost between here and the Unicode mailing list.)

>The character you are looking for is probably U+014B in lowercase or
>U+014A in uppercase. I would be rather surprised if that's not what
>you're
>looking for.

It's not exactly what I was looking for. I may just use it and make
a note that the glyph is probably not exactly right.

>BTW, a bit off topic here but: I think it's high time that Project
>Gutenberg adopted some very clear character encoding guidelines now
>that
>they're expanding so widely. Or have they already adopted them and
>I've
>just missed the policy statement...? They're in for a real mess if
>they
>don't specify character encodings in a very controlled way.

At some points, they are already a real mess. You can dig
through Gutenberg archives and find various (unlabeled)
encodings for the Latin-1 coverage. There's at least one
Japenese document that just says "you need a Japenese
OS to read this." 8-bit documents are usually labeled as
8-bit, without any indication of encoding. The Bulgarian files
are clearlly labeled Windows-1251, at least.

OTOH, the policy of doing everything possible in ASCII has
saved Gutenberg some problems. They're moving towards
Unicode for any files that can't be released in a standard
8-bit encoding (and a few that can are double released),
and a number of new books are being released in both
ASCII and Unicode editions.

See
ftp://metalab.unc.edu/pub/docs/books/gutenberg/GUTINDEX.02
and GUTINDEX.01 for recent examples. Most of the unmarked
stuff is ASCII, but there's a number of clearly Unicode marked
and "8-bit German" marked files.

-- 
David Starner - dstarner98@aasaa.ofe.org
Free, encrypted, secure Web-based email at www.hushmail.com



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT