[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10117(accepted data)

Opened 8 months ago

Last modified 5 weeks ago

Missing Mapping in "windowsZones.xml"

Reported by: robert.looyengoed@… Owned by: yoshito
Component: supplemental Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

In the Windows time zone id <-> tz database time zone id mapping file "windowsZones.xml", there is one missing value in windows 10 version 1607. The Windows time zone id is "Kamchatka Standard Time" but is unmapped to a corresponding tzdatabase id.

In order to ensure correct mapping I believe the below line should be added. However, I am wondering if it's possible to have a duplicate territory? Not sure if this might break some rules.

<mapZone type="Asia/Kamchatka" territory="RU" other="Kamchatka Standard Time"/>

file: common/supplemental/windowsZones.xml

For added comparison, I can add the entire windows time zone id's list for Windows 1607 and attached to this report.

Attachments

WindowsZones1607.txt (3.1 KB) - added by robert.looyengoed@… 8 months ago.
List of all windows zone Ids in version 1607

Change History

Changed 8 months ago by robert.looyengoed@…

List of all windows zone Ids in version 1607

comment:1 Changed 7 months ago by yoshito

Windows zone - "Kamchatka Standard Time" is marked as obsolete. Our current policy is to exclude such zones from the mapping data.

There are a bunch of other zones previously available in Windows OS, but already obsolete. It looks Windows update does not delete registry keys for these obsolete zones, instead, mark them obsolete, so it does not show up in time zone selection UI.

comment:2 follow-up: ↓ 3 Changed 7 months ago by robert.looyengoed@…

That makes sense (it has 'Old' added to the display name in the English language). I also understand the simplicity of the current policy to exclude obsolete zones. However, it might be better to include old zones for compatibility if there is no technical reason against it.

There are a few different reasons for this:

  1. Accessing windows time zones from .net, there appears to be no indication of obsolete zones. I also can't seem to find Microsoft documentation on this to even determine it myself through the registry. You are correct that these are filtered from the windows OS time zone selection UI though so this information exists somewhere.
  1. In practice, many systems are not using the latest version of the OS. However, most software needs to work on several different versions of the OS. Without maintaining backwards compatibility in this list, windows developers using it are forced to put in their own table of exceptions. This kind of defeats the simplicity and use of CLDR here. I might as well just use my entire own lookup table for performance anyway.

I would be all for adding historical windows time zone ids all the way from Windows 7 to present. Perhaps these could also be marked as obsolete in the CLDR windowsZones.xml file with a new attribute. I'm sure many software projects would benefit from this.

comment:3 in reply to: ↑ 2 Changed 6 months ago by yoshito

Replying to robert.looyengoed@…:

That makes sense (it has 'Old' added to the display name in the English language). I also understand the simplicity of the current policy to exclude obsolete zones. However, it might be better to include old zones for compatibility if there is no technical reason against it.


There are a few reasons.

  1. Simply, difficult to track. For example, there are probably more than dozen Windows zones once used since Windows NT era. Unless Microsoft has all historic zone data, it's very difficult to collect these deprecated zones.
  2. Tooling issue. Because it's not easy to manage right mappings manually, we internally use a tool to read Windows TZInfo and compare UTC offset transition with IANA time zone. So we really need actual live data on Windows system. It's not possible to get TZInfo introduced in Windows XP era, removed later, then not even included in Windows 10 installation.
  3. Reverse mapping. Some of CLDR consumers expect two way mapping. If we keep old Windows zones, it may require different structure, because it introduces one to multiple mapping.

There are a few different reasons for this:

  1. Accessing windows time zones from .net, there appears to be no indication of obsolete zones. I also can't seem to find Microsoft documentation on this to even determine it myself through the registry. You are correct that these are filtered from the windows OS time zone selection UI though so this information exists somewhere.


Although .net API documentation has a list of zones (which may include some obsolete ones), I thought the actually implementation is backed by underlying Windows registry data. I need Microsoft's input on this.


  1. In practice, many systems are not using the latest version of the OS. However, most software needs to work on several different versions of the OS. Without maintaining backwards compatibility in this list, windows developers using it are forced to put in their own table of exceptions. This kind of defeats the simplicity and use of CLDR here. I might as well just use my entire own lookup table for performance anyway.


Microsoft pushes the latest time zone data through windows update to all supported operating systems. As long as windows update is enabled, you'll get latest time zone definitions even on older operating systems. Although, the user on old operating systems can still keep obsolete time zone as system timezone with update, user will likely select new one to get proper time tracking. For this reasons, I think the support of obsolete zone is getting less important.

I would be all for adding historical windows time zone ids all the way from Windows 7 to present. Perhaps these could also be marked as obsolete in the CLDR windowsZones.xml file with a new attribute. I'm sure many software projects would benefit from this.


For the reverse mapping support, we need to introduce new data structure even we want to historic mapping. BTW, how did you collect these historic zone IDs? I think you need to start with the first revision of Windows 7, then incrementally apply all time zone update to get the full list of historic zones.

comment:4 Changed 6 months ago by robert.looyengoed@…

Sorry for the long reply.

Simply, difficult to track. For example, there are probably more than dozen Windows zones once used since Windows NT era. Unless Microsoft has all historic zone data, it's very difficult to collect these deprecated zones.

I would be happy to help find obsolete TZInfo zones. It's even possible for me to install older windows versions and check there too. I think it's important to understand that the obsolete TZInfo information may not be complete, but that's OK. As new ones are found they can be added. Partial information is better than no information at all as I think is the practice in the entire CLDR data.

it's very difficult to collect these deprecated zones.

It's very difficult to collect a lot of things in the CLDR project. However you guys are doing an awesome job! I am happy to help where I can too.

Tooling issue. Because it's not easy to manage right mappings manually, we internally use a tool to read Windows TZInfo

Using a tool is a good idea. However, this situation is a limitation with that approach. I think it may be better to do two parts: (1) generate the mapping automatically using the tool + (2) manually add section of obsolete zones. This part of the XML would not change much and could be manually controlled relatively easily. It would also be necessary to (3) check for duplicates between generated and manual data.

Reverse mapping. Some of CLDR consumers expect two way mapping. If we keep old Windows zones, it may require different structure, because it introduces one to multiple mapping.

Yes, I see that one-to-multiple mapping will be needed. However, you are already doing this in the data in other ways. Additionally, those consumers will run into the same problem as myself. What if their IANA time zone maps to a new windows TZInfo zone that isn't on an old system? This can easily happen.

For the example zone I'm looking at "Kamchatka Standard Time (Old)" now "Russia Time Zone 11", that is now listed in windowsZones.xml as:

<!-- (UTC+12:00) Anadyr, Petropavlovsk-Kamchatsky -->
<mapZone other="Russia Time Zone 11" territory="001" type="Asia/Kamchatka"/>
<mapZone other="Russia Time Zone 11" territory="RU" type="Asia/Kamchatka Asia/Anadyr"/>

From this, there are already multiple ways to map TZInfo->IANA that are taken into account by consumers. If the windows system has territory information, it can use that to better decide a type. Then if the territory is "RU" there are actually two types it could map too. So this concept is already in the data but unfortunately only for TZInfo->IANA conversion. I definately think we need to expect similar concepts going the other direction IANA->TZInfo.

For "Kamchatka Standard Time", this is simply treated as another zone so TZInfo->IANA is fairly simple. Going the other direction IANA->TZInfo, I think consumers could be made aware of an 'obsolete' attribute as shown below. Then they can ignore this and map instead to 'Russia Time Zone 11' (unless their system doesn't have it then they can use the obsolete "Kamchatka Standard Time").

<!-- Obsolete zones -->
<mapZone other="Kamchatka Standard Time" territory="001" obsolete="true" type="Asia/Kamchatka"/>

This could be added as a section at the END of 'windowsZones' for two reasons:

  1. Consuming software not updated to recognize the obolete attribute may simply look for the first match in the list. In that case, the latest TZInfo id should be first.
  2. The beginning could still be automatically generated. Then only the end has to be manually updated. A clear separation here keeps things simply from errors.

Although, the user on old operating systems can still keep obsolete time zone as system timezone with update, user will likely select new one to get proper time tracking. For this reasons, I think the support of obsolete zone is getting less important.

I have a different opinion about this for a few reasons.

  1. I know windows time zones are in the registry and updated with windows update. Also, .net standard code uses these TZInfo entries in the registry as well through the method TimeZoneInfo.GetSystemTimeZones(). However, one problem is there is no flag to indicate obsolete zones. It's also possible that even if the user is running windows 10 with updates, they are between an update cycle. Therefore, obsolete/old time zones still may be presented to the user and the user may select an obsolete zone. Maybe they are more familiar with old zone name anyway. You cannot make an assumption on what the user will do. Users will do anything.
  2. Also, consider my use case. It's possible for software to be running on both a new and old system with the need to interchange data. I store the IANA zone id in a file after the user selects a time zone. It must be the IANA zone for cross-platform support as well as everyone can agree Windows TZInfo isn't very good. However, if the user selects an old zone, I cannot map this to an IANA id since this table doesn't have the value. Then the time zone is lost after this. It is expected to use windows zones because the windows users are most familiar with that on their platform.

For the reverse mapping support, we need to introduce new data structure even we want to historic mapping.

This was discussed above. My idea is to add the "obsolete="true"" attribute to the XML then let consuming software use it as they wish.

I know I'm probably oversimplifying it. And talking about the binary CLDR data... I don't know anything about it. Unfortunately I can see that could make things much more complicated considered the tooling already in place.

comment:5 Changed 7 weeks ago by mark

  • Owner changed from anybody to yoshito
  • Priority changed from assess to critical
  • Type changed from unknown to data
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32

comment:6 Changed 7 weeks ago by mark

  • Priority changed from critical to major

comment:7 Changed 5 weeks ago by emmons

  • Milestone changed from 32 to 33
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.