From: Mark Davis (mark.davis@icu-project.org)
Date: Tue Aug 23 2005 - 12:27:15 CDT
Some responses below.
Theo Veenker wrote:
> Mark Davis wrote:
>
>> Let me explain what is going on. Quite a bit of the structure of (and
>> constraints on) LDML are in the specification, and cannot be
>> encapsulated in the DTD. For most elements in LDML, we allow for
>> alternate elements. So you could have the following, for example.
>>
>> <week>
>> <minDays count="1"/>
>> <firstDay day="sun"/>
>> <firstDay day="mon" alt="financial" draft="true"/>
>> <weekendStart day="sat"/>
>> <weekendEnd day="sun"/>
>> </week>
>
>
> I see. But how does one know which alternate forms exist (in this
> particular
> case for example). It isn't a key/type option in a localeID. Of course
> I see
> the alternate forms when I parse them, but my application still
> wouldn't know
> which one applies.
Sorry, the above example is illustrative; 'financial' doesn't actually
occur. The available alt values are in Appendix K and L.
The metadata we currently have is in
<http://unicode.org/cldr/data/common/supplemental/supplementalData.xml>:
search for "<metadata>"
We have an RFE to add more metadata
<http://dev.icu-project.org/cgi-bin/locale-bugs?findid=641>; we had
originally intended to add it in 1.3, but delayed it to 1.4. (If you
have any comments on 641 you can add a reply.)
>
>>
>> You may ask: how about XML Schema? While this would better than a DTD
>> in describing more of the structure, it would still be far from
>> complete. So it hasn't been a high priority because it wouldn't buy
>> us that much.
>
>
> Look like it's best to just ignore the DTD. <mumble>I hope to wake up one
> morning to find out that XML and associated crap had never been invented.
> Do we really have to XML-ize everything? Apparently yes, because the
> format
> is there and everybody else does.</mumble>
The design of XML could have been significantly simplified, eg by not
having CDATA or entities (except NCRs), only using UTF-8, etc. That
being said, it is a huge improvement in terms of having a standardized
format that all tools can use and interchange. So despite the few warts
(many for historical reasons), it's definitely the way forward. Reminds
me of a few other technologies...
>
>> What we have been doing is adding metadata to the supplemental data
>> file so that particular areas can be mechanically checked. There are
>> undoubtedly still areas where the description of the structure can be
>> improved in the spec (see the working draft for the next release at
>> http://unicode.org/cldr/data/docs/web/tr35.html) or where metadata
>> can be added; if you have suggestions for improvements, you can file
>> them at http://unicode.org/cldr/filing_bug_reports.html.
>>
>> (BTW, we are planning to move this particular element into the
>> supplemental data in the next release; the goal is to only have
>> language-based data in the locale files such as
>> http://unicode.org/cldr/data/common/main/fr.xml, and all other data
>> in the supplemental data file
>> (http://unicode.org/cldr/data/diff/supplemental/supplemental.html:
>> information about territories, currencies, scripts, timezones, etc.)
>
>
> Sounds good. Will this move have taken place before the 1.4 Phase 2 Beta
> Release, or will the restructuring continu until the final 1.4 release?
Our goal is to get all the structural changes in early, and have the
final phase be only gathering data.
>
> Theo
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 12:29:06 CDT