Re: Compiling a list of Semitic transliteration characters from Richard Wordingham on 2012-09-21 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 21 Sep 2012 17:40:54 +0100

On Thu, 20 Sep 2012 18:09:03 -0500
Naena Guru <naenaguru_at_gmail.com> wrote:

> Statements like,
>
> Using Unicode is recommended in preference to any code page because
> it has better language support and is less ambiguous than any of the
> code pages.
>
> are trying to assert untruths, that people tend to believe without
> concrete reasons. 'better language support' and 'less ambiguous'?

With anything but Windows-1252, the language support is likely to be
made available via Unicode. The removal of ambiguity comes from two
fronts:

1) Some ASCII characters are overworked, and have been split into
separate characters in Unicode.

2) Tagging of 'plain text' is fairly poor.

> That statement is by Microsoft right in the registration of
> Windows-1252 that plainly contravenes Unicode:
> http://msdn.microsoft.com/en-US/goglobal/cc305145.aspx
>
> All languages in the Developed countries in the West including
> English, use Windows-1252!

Actually, I think Wales counts as a developed region. Windows-1252
does not support accents on 'w'.

Presumably you are treating the dots above in Irish as irrelevant
because the use of 'h' has largely replaced them.

I presume you are unimpressed by the fact that Latin as written in my
school textbooks could not be written in Windows-1252 - the vowels with
macron and breve are unsupported by it!

Nowadays, I usually use minus signs (U+2212), which is not in
Windows-1252, for negative numbers in text in Unicode-capable systems.
It gives better results than hyphen-minus, both visually, and for line
breaking. I also get better cutting and pasting of single Greek
letters if they are entered as characters rather than symbols. Oddly
enough, I hardly notice the absence of an ohm symbol.

> I agree that following ISCII, whatever it is, might be the problem.

ISCII = Indian Stand Code for Information Interchange.

> It is presumptuous to say, "the rest of the post is irrelevant".

It was irrelevant to compiling a list of Semitic transliteration
characters. Semitic transliteration characters have the advantage of
being in the Latin script, which in general behaves as programmers used
to the Latin script expect. There was, though, an Egyptian
transliteration character that gave grief, because of subtle
differences in behaviour between a Greek and a Latin diacritic that
has been unified. The solution was to declare the diacritic to be the
Cyrillic version of the diacritic, because it had not been unified
with the other two, so the character was decreed to be <U+0069 LATIN
SMALL LETTER I, U+0486 COMBINING CYRILLIC PSILI PNEUMATA>.

Richard.
Received on Fri Sep 21 2012 - 11:45:23 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 21 2012 - 11:45:24 CDT