Re: RTF language codes

From: Marc Durdin (mcdurdin@tavultesoft.com)
Date: Mon Jul 23 2001 - 22:33:50 EDT


At 04:51 PM 23/07/2001 -0700, Michael \(michka\) Kaplan wrote:
>From: "jgo" <john@nisus.com>
>
>> I don't see such a table via search from the Unicode site.
>> Is this just another M$ non-standard "standard" subject to
>> change at a whim? (Does the consortium have anything to do
>> with it at all?)
>
> This is probably a less than fair characterization.... this is a feature
> that has existed for Windows for at least the entire lifetime of Win32
> (perhaps even longer?) and has never changed, ever. In fact, the team at
> Microsoft which owns these codes is required to keep a degree of stability
> that is quite phenomenal, since there are so many MS products that depend on
> the values. Even in MLang and their new NLS extensions in the .NET CLR
> (common languags runtime) there are methods to obtain LCIDs.

I must disagree with this statement. I know of quite a few changes to the LCID list, some of which have caused me considerable pain in the past.

In the MSDN library, at least in Nadine Kano's book "Developing International Software for Windows 95 and Windows NT", which is included on the CD, the language Lao (or Laotian) is listed with LCID 0x42B. This LCID was set to Armenian in Windows 2000, with no explanation. Microsoft told me that the original listing of Lao was a 'mistake'. This mistake is still listed in the MSDN library that I have installed.

There is no apparent reference to the old Lao LCID on Microsoft's website any more, but a PDF from around the same time at another site still lists it (look at page 7):
  http://dec.bournemouth.ac.uk/forth/euro/ef98/pelcetal98.pdf

A MS Word 2000 FAQ also lists Lao in its table of supported LCIDs (now as 0x454), but none of the MSDN tables do:
  http://support.microsoft.com/support/kb/articles/Q221/4/35.ASP

Another example of changes is the split of the single LCID Serbo-Croatian to two LCIDs for Serbian and Croatian. The RTF 1.0 specification listed:
  Serbo-Croatian (Latin) = 0x41a
  Serbo-Croatian (Cyrillic) = 0x81a

This has now changed to:
  Croatian = 0x41a
  Serbian (Cyrillic) = 0x0c1a
  Serbian (Latin) = 0x081a

The logical split into two languages, if well documented, would be fine -- although a deprecation of the original LCID would have been more appropriate, but changing the script for LCID 0x81a is never acceptable!

I opened up my copy of the RTF specification 1.0. The LCID 'Rhaeto-Romanic' (0x0417): does not appear in later lists, not even as 'deprecated'.

Finally, try to compare the following two tables (Word 2000 vs Platform SDK), and tell me which one is correct:
  http://support.microsoft.com/support/kb/articles/Q221/4/35.ASP vs
  http://msdn.microsoft.com/library/en-us/intl/hh/winbase/nls_8xo3.asp.
Look for instance at Syriac, Welsh, Lao, Khmer, Gaelic, Frisian Netherlands, French West Indies, Divehi.... I found these discrepancies in 2 minutes.

So, there are significant issues with Microsoft's LCIDs:

1. The tables are not static: significant changes have been made in the past, and errors in published documentation have never been documented or rectified. There is no guarantee that these numbers will stay the same.
2. The list of LCIDs needs to be managed as a formal standard. I see no indication that it is.
3. What is the official source for LCIDs? I have found at least 10 tables of LCIDs, *all different* in the languages they list on Microsoft's site. Microsoft should have just one table of LCIDs published on their website, not one for each technology.

Marc Durdin
Tavultesoft



This archive was generated by hypermail 2.1.2 : Mon Jul 23 2001 - 23:39:13 EDT