[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #3572(accepted data)

Opened 4 years ago

Last modified 7 weeks ago

Proposal to contribute address metadata to CLDR

Reported by: jia.shao.peng@… Owned by: jchye
Component: main Data Locale:
Phase: Review:
Weeks: Data Xpath:
Xref:

Description

I am proposing to contribute the Google Address Widget Metadata to be included in the Unicode Common Locale Data Repository (CLDR) 2.0. Please find the proposal in the following Google Doc:

https://docs.google.com/document/pub?id=1KL6hmksP1D-qnpyI6FpLv_C-rQBEcB_4B18SNCaN39M

Please let me know if you have any feedback.

Thanks,

Shaopeng

Attachments

mypatch.diff (95.0 KB) - added by jia.shao.peng@… 4 years ago.
Patch for adding new XML file and updating DTD
mypatch.2.diff (102.4 KB) - added by jia.shao.peng@… 4 years ago.
Updated patch
mypatch.3.diff (102.3 KB) - added by jia.shao.peng@… 4 years ago.
Updated patch
mypatch.4.diff (92.2 KB) - added by jia.shao.peng@… 4 years ago.
Updated patch
mypatch.5.diff (96.4 KB) - added by jia.shao.peng@… 4 years ago.
Updated patch

Change History

comment:1 Changed 4 years ago by jia.shao.peng@…

By the way, I will add the XML file to this ticket in the next few days. Meanwhile, please feel free to let me know if there is any feedback to the proposal itself.

Thanks,

Shaopeng

Changed 4 years ago by jia.shao.peng@…

Patch for adding new XML file and updating DTD

comment:2 follow-up: ↓ 37 Changed 4 years ago by jia.shao.peng@…

Hello,

I have uploaded the patch containing the new XML file and updates to the DTD. Please let me know if there is any feedback to my proposal or the patch.

Thanks,

Shaopeng

comment:3 follow-up: ↓ 8 Changed 4 years ago by kent.karlsson14@…

I'm not an expert on postal addresses, particularly not around the world.
But let me take a look at a region I'm familiar with...

        <postalCountry iso3166="SE"> 
             <localFormat>%O%n%N%n%A%nSE-%Z %C</localFormat> 
             <uppercaseFields>C</uppercaseFields> 
             <requiredFields>ACZ</requiredFields> 
             <zipPrefix>SE-</zipPrefix> 
             <zipValidationRule>\d{3}[ ]?\d{2}</zipValidationRule> 
             <zipType>postal</zipType> 
             <stateNameType>province</stateNameType> 
             <centralPostOfficeURL>www.posten.se</centralPostOfficeURL> 
         </postalCountry>

I think this is an oversimplification (and there are some strange things as well).

Assuming this is for mail to someone in a company/organisation: The mail could be
to whoever holds a function (or similar), which maybe could be covered by saying
that "name" (%N presumably) is the name of the function, rather than the name of
a particular person. But even if to a particular person in the company, there are
two forms

	NNNN
	Company
	.....

and

	Company
	att: NNNN
	.....

The first is definitely to that person, and no-one else (maybe exception for *personal*
secretaries, but that would be rare). The second form is primarily to that person, but
if he/she is absent or gets too much mail, the mail can be handled by someone filling
in for that person, or by an assistant. ("att" here is abbreviation for "attention",
the French word...)

One thing about home addresses (not business addresses), is that it is not uncommon
to have a "care of" (℅) address. I.e. the person addressed sublets (part of) the
apartment/house of someone else, and the name on the letterbox/door is that of the
owner or primary renter of the apartment/house.

And some of the strange things (assuming I interpret this correctly):

The SE- part is a *part* of the zipcode, so zipPrefix is sort of correct. However you
have it both as "zipPrefix" and in the "localFormat", which seems wrong.

When the mail is sent *within* the (European mostly) country in question, the country
prefix part of the zip code is usually left out, and so is (generally, globally, I presume)
the country name (%C). So the zipcode prefix as well as the country name (there is a
redundancy here if country code is used) can be, and usually is, left out for in-country
mail. And the country name can be written in either the language of the country from where
the mail is *sent* from, or in French, which is the international postal language.

The country name does not really have to be in uppercase. The uppercase is just for
emphasis. In practice, underlining or double underlining works just as well for emphasis
here, so does adding an empty line and/or using larger font.

Then there is a part in the XML record quoted above: <stateNameType>province</stateNameType>.
Not sure what that is for. Yes, there is a division into provinces ("län" in Swedish), but
these never occur in postal addresses.

You seem to have swept under the carpet where street numbers are placed by just using
%A which seems to subsume "everything not covered by other fields and fit between
fields so-and-so"... But granted, that avoids lots of complications, not just number on
a street but also floor number, apartment number, block name, non-city (rural...)
postal addresses, and surely more.

comment:4 follow-up: ↓ 6 Changed 4 years ago by mark

  • Owner changed from somebody to umesh
  • Priority changed from assess to major
  • Status changed from new to assigned

Notes from discussion:

Some way to have multiple locale formats? Attribute?

  • For example, structure may vary according to the 'state'.
  • May have different structure if some fields are missing?

stateNameType => provinceNameType or subdivisionNameType because 'state' can mean nation.
zip => postalCode

Agreed to post as a PRI. Need the data in a public place first. (Add link to Unicode conference presentation if available)

comment:5 Changed 4 years ago by pedberg

  • Cc chrish added

comment:6 in reply to: ↑ 4 Changed 4 years ago by jia.shao.peng@…

Thanks for the comments! I have updated the proposal with the changes suggested. Please let me know if there is anything else you want me to change. I will upload a new patch tomorrow if the change I made to the proposal looks Ok to you.

Thanks,

Shaopeng

Replying to mark:

Notes from discussion:

Some way to have multiple locale formats? Attribute?

  • For example, structure may vary according to the 'state'.
  • May have different structure if some fields are missing?

stateNameType => provinceNameType or subdivisionNameType because 'state' can mean nation.
zip => postalCode

Agreed to post as a PRI. Need the data in a public place first. (Add link to Unicode conference presentation if available)

comment:7 Changed 4 years ago by jia.shao.peng@…

I have regenerated the patch. Changes made:

stateNameType renamed to provinceNameType
zipPrefix renamed to postalCodePrefix
zipValidationRule renamed to postalCodeValidationRule
zipType renamed to postalCodeType
localFormat and latinFormat combines to one "format" with a "locales" attribute.

Changed 4 years ago by jia.shao.peng@…

Updated patch

comment:8 in reply to: ↑ 3 Changed 4 years ago by jia.shao.peng@…

Thanks for the comment Kent! I will try to address your questions below. Some of your suggestions are about the data itself, which might be better left to be corrected in the CLDR vetting process (so that we have more opinion) if this proposal gets eventually approved.

Replying to kent.karlsson14@…:

I'm not an expert on postal addresses, particularly not around the world.
But let me take a look at a region I'm familiar with...

        <postalCountry iso3166="SE"> 
             <localFormat>%O%n%N%n%A%nSE-%Z %C</localFormat> 
             <uppercaseFields>C</uppercaseFields> 
             <requiredFields>ACZ</requiredFields> 
             <zipPrefix>SE-</zipPrefix> 
             <zipValidationRule>\d{3}[ ]?\d{2}</zipValidationRule> 
             <zipType>postal</zipType> 
             <stateNameType>province</stateNameType> 
             <centralPostOfficeURL>www.posten.se</centralPostOfficeURL> 
         </postalCountry>

I think this is an oversimplification (and there are some strange things as well).

Assuming this is for mail to someone in a company/organisation: The mail could be
to whoever holds a function (or similar), which maybe could be covered by saying
that "name" (%N presumably) is the name of the function, rather than the name of
a particular person. But even if to a particular person in the company, there are
two forms

	NNNN
	Company
	.....

and

	Company
	att: NNNN
	.....

The first is definitely to that person, and no-one else (maybe exception for *personal*
secretaries, but that would be rare). The second form is primarily to that person, but
if he/she is absent or gets too much mail, the mail can be handled by someone filling
in for that person, or by an assistant. ("att" here is abbreviation for "attention",
the French word...)

One thing about home addresses (not business addresses), is that it is not uncommon
to have a "care of" (℅) address. I.e. the person addressed sublets (part of) the
apartment/house of someone else, and the name on the letterbox/door is that of the
owner or primary renter of the apartment/house.

As I mentioned in the proposal, the "format" here is meant to specify the most common way of layout address fields for users to input. For that purpose, we have to pick a layout, and we picked the most common layout. The "Name" is a separate field so that it could be retrieved and validated when needed (e.g. credit card payment). The "Address Lines" could be multiple fields, and for the second example you gave, both the "Company" and "att: NNNN" could go to "Address Lines".

And some of the strange things (assuming I interpret this correctly):

The SE- part is a *part* of the zipcode, so zipPrefix is sort of correct. However you
have it both as "zipPrefix" and in the "localFormat", which seems wrong.

I agree with you on this aspect. There is really no point repeating this. I could remove it from "format".

When the mail is sent *within* the (European mostly) country in question, the country
prefix part of the zip code is usually left out, and so is (generally, globally, I presume)
the country name (%C). So the zipcode prefix as well as the country name (there is a
redundancy here if country code is used) can be, and usually is, left out for in-country
mail.

When address fields are being laid out for users to enter their address, we are not quite sure whether it will be an in-country or out-country address. Also, sometimes it could be a personal preference of whether to always write the address in the full form, or whether to write the minimum that could work. Unless it would be misleading some people in a country, maybe it is better to retain the full form.

And the country name can be written in either the language of the country from where
the mail is *sent* from, or in French, which is the international postal language.

Yes. The "format" data in the XML file specifies the order to lay out individual address fields, but doesn't impose any restriction on the language of the value that will be entered in each field.

The country name does not really have to be in uppercase. The uppercase is just for
emphasis. In practice, underlining or double underlining works just as well for emphasis
here, so does adding an empty line and/or using larger font.

Agreed. Uppercase does seem to be the most common way internationally to place emphasis on country name (when the name could be uppercased). The "uppercaseFields" is not used to validate the data user entered, but used to format the address later.

Then there is a part in the XML record quoted above: <stateNameType>province</stateNameType>.
Not sure what that is for. Yes, there is a division into provinces ("län" in Swedish), but
these never occur in postal addresses.

The purpose of this is to distinguish the type of the administration area. If province never occur in postal address, then the value in the "format" tag shouldn't contain it. Again, this is about the data itself, and could be better corrected in the vetting process.

You seem to have swept under the carpet where street numbers are placed by just using
%A which seems to subsume "everything not covered by other fields and fit between
fields so-and-so"... But granted, that avoids lots of complications, not just number on
a street but also floor number, apartment number, block name, non-city (rural...)
postal addresses, and surely more.

Yes, this is a simplification, and consistent with how it is normally done in the industry. We have looked into making the address field more detailed in a dozen of countries, and things get quite complicated. We may propose that as an addition to this data in the future, but have no plan to do it right now.

Changed 4 years ago by jia.shao.peng@…

Updated patch

comment:9 Changed 4 years ago by jia.shao.peng@…

I have uploaded a new patch. In the patch I have removed duplicate postal code prefix in "format", as suggested by Kent.

comment:10 follow-up: ↓ 12 Changed 4 years ago by kent.karlsson14@…

"

  1. <postalCodeType>

Required/Optional

Optional. Default value: postal

Meaning

Contains an enum that denotes the type of label for the postal code field. Currently, the valid values include:

  1. postal
  2. zip

"

This part does not seem to have any useful purpose. "zipcode" is just a nifty name for postal code, and there is no difference in intent or purpuse AFAICS.


"

  1. <provinceNameType>

Required/Optional

Optional. Default value: province

Meaning

Contains an enum that denotes the type of label for the "state" [%S] field. ...

"

This part should be left out for regions where the %S field is not used in addresses, and the default value should be empty ("").


"2. <uppercaseFields>

Required/Optional

Optional. Default value: C
"

Thus "<uppercaseFields>C</uppercaseFields>" can and should be omitted.


""localFormat" attribute": there is no such attribute; but there is a "format" element.

comment:11 follow-up: ↓ 13 Changed 4 years ago by kent.karlsson14@…

There seems to be no %-letter for "country". For international papermail that would be required (along with the postal code prefix when such is used), but placed somewhat differently in different address layout schemes. For in-country papermail neither country nor "postal code prefix" (which is a country code...) is needed. The documentation should say that %Z includes the postal code prefix if there is such a prefix and the mail is not (or maybe is not) in-country. It should also say that %T, or whatever letter you choose for territory/country (C is taken for "city"/locality; note that some cities may have postal subdivisisions which are used instead of the city name), is (along with any %n just before it) left out if the mail is known to be in-country.

comment:12 in reply to: ↑ 10 Changed 4 years ago by jia.shao.peng@…

Replying to kent.karlsson14@…:

"

  1. <postalCodeType>

Required/Optional

Optional. Default value: postal

Meaning

Contains an enum that denotes the type of label for the postal code field. Currently, the valid values include:

  1. postal
  2. zip

"

This part does not seem to have any useful purpose. "zipcode" is just a nifty name for postal code, and there is no difference in intent or purpuse AFAICS.

It is true that there is no difference in intent of purpose between zipcode and postal code, but we still need to retain the information which is used in which countries, so that the correct label could be displayed when we present the address form for users to fill up. This is similar to the provinceNameType element, in the sense that we know a "province" in China is just like a "state" in the USA, but we still need to retain this information so that we could label the input field as "Province" for China and "State" for the USA, assuming both are in English UI.


"

  1. <provinceNameType>

Required/Optional

Optional. Default value: province

Meaning

Contains an enum that denotes the type of label for the "state" [%S] field. ...

"

This part should be left out for regions where the %S field is not used in addresses, and the default value should be empty ("").

Ok, I could do that.


"2. <uppercaseFields>

Required/Optional

Optional. Default value: C
"

Thus "<uppercaseFields>C</uppercaseFields>" can and should be omitted.

The "Default value: C" is specified in the design doc, but I didn't find an easy way to specify that in the XML without filling that info for every country. Do you know a better way to specify the default in the CLDR XML file?


""localFormat" attribute": there is no such attribute; but there is a "format" element.

I have updated this. Thanks for catching them :-)

comment:13 in reply to: ↑ 11 ; follow-up: ↓ 19 Changed 4 years ago by jia.shao.peng@…

Replying to kent.karlsson14@…:

There seems to be no %-letter for "country". For international papermail that would be required (along with the postal code prefix when such is used), but placed somewhat differently in different address layout schemes. For in-country papermail neither country nor "postal code prefix" (which is a country code...) is needed. The documentation should say that %Z includes the postal code prefix if there is such a prefix and the mail is not (or maybe is not) in-country. It should also say that %T, or whatever letter you choose for territory/country (C is taken for "city"/locality; note that some cities may have postal subdivisisions which are used instead of the city name), is (along with any %n just before it) left out if the mail is known to be in-country.

I have added a comment to "4. <postalCodePrefix>" to mention postal code prefix will be inserted in front of %Z when it is present.

I have also added a comment to "1. <format locale=..>" to address your comment on in-country v.s. out-country format and defining a letter for country/territory.

Changed 4 years ago by jia.shao.peng@…

Updated patch

comment:14 Changed 4 years ago by jia.shao.peng@…

I have uploaded a new patch which removes the <provinceNameType> for countries that doesn't have %S in the format, as suggested by Kent.

Changed 4 years ago by jia.shao.peng@…

Updated patch

comment:15 Changed 4 years ago by jia.shao.peng@…

I have just uploaded a new patch and updated the proposal. Here is a quick summary of changes:

  • renamed "format" to "layout" to make its purpose clearer
  • "layout" now contains an "order" attribute which specifies relevant orders of layout
  • modified root.xml and relevant locale-specific XML files to specify the order for different locales
  • updated relevant DTDs
  • updated proposal to reflect the above changes, and added a new section on future plans.

comment:16 follow-up: ↓ 29 Changed 4 years ago by jefsey@…

The prefered language, script and normative national referent to be used should be included with the way they are encoded. A necessity in a global world, a prophilactic measure against unreadable spam and mail.

It should be possibile to add digital ecosystemic information such as (not limited): fax, e-mail, radio-frequency, social networks, telephone numbers, etc.

comment:17 follow-up: ↓ 30 Changed 4 years ago by dwayne@…

US example seems to be incorrect. The uppercaseFields tag defines C&S as requiring capitals. Yet in the example only state is capitalised and city is not. So either this capitalisation is optional, in which case the text needs updating or the example is incorret.

comment:18 follow-up: ↓ 31 Changed 4 years ago by Bruno Haible <bruno.haible@…>

The <uppercaseFields> element for Germany is wrong. It should be empty.

Rationale: In Germany, customarily the city line is *not* uppercased. The line which is uppercased
is the country line, for international shipments. This country line is listed after the
zip code + city line, and is not even modeled in the current CLDR proposal.

comment:19 in reply to: ↑ 13 ; follow-up: ↓ 32 Changed 4 years ago by kent.karlsson14@…

Replying to jia.shao.peng@…:

Replying to kent.karlsson14@…:

There seems to be no %-letter for "country". For international papermail that would be required (along with the postal code prefix when such is used), but placed somewhat differently in different address layout schemes. For in-country papermail neither country nor "postal code prefix" (which is a country code...) is needed. The documentation should say that %Z includes the postal code prefix if there is such a prefix and the mail is not (or maybe is not) in-country. It should also say that %T, or whatever letter you choose for territory/country (C is taken for "city"/locality; note that some cities may have postal subdivisisions which are used instead of the city name), is (along with any %n just before it) left out if the mail is known to be in-country.

I have added a comment to "4. <postalCodePrefix>" to mention postal code prefix will be inserted in front of %Z when it is present.

I have also added a comment to "1. <format locale=..>" to address your comment on in-country v.s. out-country format and defining a letter for country/territory.

I find no motivation for omitting a %-letter for country, nor for omitting it from the formats. Please add a %letter for country (I'd suggest %T) and add it to the formats.

comment:20 follow-up: ↓ 33 Changed 4 years ago by Bruno Haible <bruno.haible@…>

The ISO/IEC 14652 draft http://www.open-std.org/jtc1/sc22/wg20/docs/n972-14652ft.pdf
(section 4.11) describes how to formalize the address fields for POSIX systems. GNU libc
implements this facility.

In contrast to the CLDR proposal, it also foresees the fields

  • Firm, Department, Building
  • Street and house number as separate fields
  • Room number
  • Floor number

Have you considered this?

comment:21 follow-up: ↓ 34 Changed 4 years ago by bobevers@…

There are a few things wrong for the Netherlands, first the section from the proposal:

<postalCountry iso3166="NL">
	<layout order="SmallToLarge">%O%n%N%n%A%n%Z %C</layout>
	<uppercaseFields>C</uppercaseFields>
	<requiredFields>ACZ</requiredFields>
	<postalCodeValidationRule>\d{4}[ ]?[A-Z]{2}</postalCodeValidationRule>
	<postalCodeType>postal</postalCodeType>
	<centralPostOfficeURL>www.tntpost.nl</centralPostOfficeURL> 
</postalCountry>

the space in the postalcode is (officially) mandatory and because of this space there should be 2 spaces between the postalcode and the cityname.
The space in the postalcode is in practice left out more and more, reducing the need for 2 spaces before the cityname, but if it is mentioned 2 spaces are needed.

so valid are an example:
1000 AA AMSTERDAM
1000AA AMSTERDAM
1000AA AMSTERDAM

The province name is missing, while this is normally not used, in a few cases (multiple cities with the same name) this is common (for the small city).

comment:22 follow-up: ↓ 35 Changed 4 years ago by helenashih@…

In the <layout-order> element, you might consider having an "Area" and "District" under the "sub-locality" notion. In Japan and Taiwan, this is commonly known as "里" (area), "區" (district)and then "市" (city). This is probably not a huge deal for commercial mailings and the time I see a strict mailing requirement on this is usually during election time when whom you might be able to vote for is directly related to this information. Sometimes it's possible the different areas of the same district within a city would have different voting ballots. For most of the governmental mailings, this is usually followed to the T even outside election cycles.

comment:23 Changed 4 years ago by Shawn Steele (Microsoft) <shawnste@…>

Although improving addresses is a good idea, I'm concerned about the rather quick (to me at least) review period, and this seems to gloss over lots of challenges with lots of addresses. It's been years, but I was tangentially associated with mailing stuff, and i18n addresses were covered in a rather large book.

I'd be good if companies that actually mail stuff a lot took a look at this (like a direct marketing). For keeping track of, say, subscriber information, the fields provided here are not necessarily sufficient, even for the US. (for example carrier route sorting).

Other fields aren't as simple as they seem. For example, the "postal code" in Europe often implicitely includes the country as well. Although it could be treated as opaque, the correlation is interesting, particularly for validation.

Another concern is that format is defined by the host country, however international mail likely has additional requirements from in-country mail (like the country name), and some postal services in the sending country may prefer some form that isn't quite like what the receiving country's postal service would prefer.

For example, what about addresses in different scripts? (Latin versus native script?)

I'm also curious what other standards exist or are in progress for address data formats? For example vCard (which doesn't seem to say much.)

At the minimum I think that this needs a rigorous review with the appropriate postal standards, and I'd include references to those documents. Better would be review by people that do a lot of shipping. I'd rather take a little extra time and get it right.

comment:24 Changed 4 years ago by jefsey@…

Interestingly enough I did no see any reference to the S42 and S52 international standards for postal addresses.

May be you want to consider http://www.upu.int/en/activities/addressing/addressing-standards.html before reinventing them, or may be create a common working group. May be Unicode could get an UPU certificate of recognition. Or may by Locale files could simply (or additionally) provide the local Postal address template?

  1. The UPU has developed two addressing standards: S42 and S53.

Standard S42: International postal address components and templates

The S42 international addressing standard consists of two parts:

S42 part A defines:

  • a set of postal address components used in worldwide addresses;
  • languages for expressing address templates, i.e. formal descriptions of address formatting rules.

S42 part B:

  • provides country-specific address templates. A country defining its S42 template provides precise information about its address elements and formatting rules.
  • creates a library of templates that can be easily incorporated into computer systems for managing addresses.

Standard S 53: Exchange of name and address data

S53 facilitates exchange of name and address data between postal authorities, businesses, mailers and other organizations. It incorporates the S42 elements and templates.

  1. The following document tables and models might help you:

http://www.cs.up.ac.za/cs/scoetzee/docs/Coetzee_ISOFocus2010_WhatAddressStandardsTellUsAboutAddresses.pdf

  1. another set of metadata would be to match other metadata registry standards in foressing a place for the date of entry (permitting to know which address is obsolete) and a confidence factor.


  1. May be could you also contact people from JTC1/SC32/SG2 for coordination?

comment:25 follow-up: ↓ 38 Changed 4 years ago by Shawn Steele (Microsoft) <shawnste@…>

This is being discussed on the cldr-users list as well, cc'ing here:

I want to call out existing standards with the address form CLDR proposal. I think this is a pretty big extension of what Unicode’s been doing, so I re-added unicore.

 Other standards

OASIS was mentioned, however there are apparently several additional common formats:
The USPS has an EDI format for addresses, used mainly by large institutions, to support domestic addresses. The Electronic Commerce Code Management Association (ECCMA) supports the International Address Element Code (IAEC) list on its Web site. The OASIS xNAL standard is widely referenced within e-commerce efforts such as ebXML. Internationally, the UPU S42 standard includes an element list, address templates and rendition instructions. ADIS is interoperable with all these efforts, can use UPU S42 elements, templates, and rendition instructions, and supports mail production, address management, and USPS Intelligent Mail(TM).

It seems that some of these are widely deployed, so it is probably critical to describe how the input fields map to these standards. I think that this proposal MUST speak to interoperability with the existing standards and probably why Unicode is the right place to maintain YAMS (yet another mail standard), considering that Unicode’s expertise has not historically been an expert organization WRT mailing systems.

I’m not opposed to including address data within the CLDR. Or, perhaps pointers to other standard(s) with the relevant information. However “addresses” are a huge space, and there’s already a rather large focus on this problem by other organizations, and adoption by other organizations. I think it needs to be very clear what business problem Unicode is solving by adding another address related standard and why it isn’t handled by one of the above (or some other) standards.

If there is a specific “input” problem that these groups are missing, then we should work with those groups, since they are experts, to make sure the problem is fully understood and ensure that Unicode is providing value to the mail addressing industry rather than adding to confusion or causing interoperability problems between the data input and data processing layers.

This needs far more review than the current schedule has allowed.

  • Shawn

comment:26 follow-up: ↓ 39 Changed 4 years ago by Shawn Steele (Microsoft) <shawnste@…>

More comments from the cldr-users thread:

Trying to combine a few threads 

In general I think there’re a couple things to be solved. The intent is data entry, which must be in a recognizable form for the user because they’re the one that has to enter the data. OTOH, the data, once input, must be stored by the computer in a useful form. Those aren’t necessarily the same forms, but they certainly influence one another.

 This is supported with the postalCodePrefix field in the design. For example, a Zurich postal code could look like "CH-8001".

There’s also a country field, so it’d be nice to be able to tell that the “CH” and “Switzerland” match. For that matter, there’s a problem with “Switzerland” vs. “Schweiz” vs. whatever else, particularly if someone’s mailing locally or between countries.

 To my best knowledge, for international mailing, the destination country name needs to be written in the international postal language (e.g. French or English), and the other part of the address could be written in the in-country form (because it will be handled by the postman in the destination country). That being said, the metadata we are proposing here specifies the layout of address components in-country for entering addresses. To format the data entered properly, a better address formatter might need to be implemented, possibly relying on additional address metadata.

I believe there are conflicts, particularly between the USPS and others. The CJK examples would probably be an example where the country would be funny in CJK. I also (perhaps incorrectly) believe that some high-volume traffic isn’t just sent “to the receiving country”, but rather sent to a specific region. Eg: mail from London to USA might be put on a plane to the East Coast or West Coast depending on Zip Code. That requires that some sort of understanding of the receiving country’s conventions in a form the sending country can understand.

 In CJK countries, when address is written in the local script, the order of address fields is reversed. We support the two orders in our design depending on the language/script. E.g. for Taiwan:

<layout order=”LargeToSmall”>%Z%n%S%C%n%A%n%O%n%N</layout>
<layout order=”SmallToLarge”>%N%n%O%n%A%n%C, %S %Z</layout>

That helps me understand the purpose of theses, but that’s kind of my problem. I had no idea when to use which. I’m still not sure I have any idea. Also how does BIDI impact this?

 Yes, personal names are very important, which is why we don't want to get it wrong. As this moment, the name field in this design is a blank field, which users could enter anything. By saying formatting of names is out of scope in the design doc, we meant we will output name exactly as it is entered in the name field. No metadata support is included to help format names as nick names or forms like "Mr. xxx".

That makes sense, but is it extensible later? If I start getting a bunch of underspecified input and want to be smarter about it later, then it’s going to be a problem to parse the names later.

 Similarly, department within an organization can have a big meaning. Ignoring someone’s department could be offensive in some cases.
 This could be entered into the organization field. No validation is done for this field.

Organization and department often have to be on different lines, so I think this needs more thought.

 This (carrier route) seems to be only applicable to US, and it is optionally used by some businesses to do saturation mailing campaigns to get discount on mailing as you mentioned. When they do that, they are unlikely to use ZIP+4 code as all, and the 9-digit carrier route information could possibly be stored in the ZIP field (it is serving the same purpose anyway), as it might not be optimal to create a separate field for carrier route info, which will only be used for some use-cases in one country only.

Actually, I think you had a similar example for France using the sorting code. CRRT sorted stuff REQUIRES zip+4, the field cannot be reused. It also goes in a special place when printed. X might work though, but this shows some of the missing complexity. Granted carrier routes would never be input by hand, however I think that the spec needs to consider this field.

 C/O John Smith

This is like department. It “could” be part of the 1st line, however the USPS and Zip+4/carrier route sorting software in the US is very strict about the format of the “lines” of the address, and that’s decidedly NOT part of the address. It might be better to treat this as a second line of the recipient. (That might work with organization/department as well). Brainstorming, you could have:

Recipient
Organization
Address

Where recipient could be:

Shawn Steele
c/o Someone Else

And organization could be:
Microsoft Corporation
Windows

And address could be

1 Microsoft Way

I’m not sure how well that works with the USPS form, but it’s worth thinking about. That could also handle the “legal garbage” that James mentioned.

-Shawn

comment:27 follow-up: ↓ 36 Changed 4 years ago by Jeroen Ruigrok van der Werven <asmodai@…>

The postal code rule for NL should be changed from <postalCodeValidationRule>\d{4}[ ]?[A-Z]{2}</postalCodeValidationRule> to <postalCodeValidationRule>[1-9]\d{3}[ ]?[A-Z]{2}</postalCodeValidationRule> if the purpose is to not allow invalid postal codes. Our postal codes can never start with a zero (0).

comment:28 follow-up: ↓ 40 Changed 4 years ago by emmons

IBM's position on the subject PRI is as follows:

IBM cannot support this proposal until the following has been satisfied:

  1. The scope of what this addressing schema is supposed to cover is defined.

It says "This metadata is intended for presenting a form for users to fill in with address data." This seem to be a pretty limited definition since address data can cover a wide variety of needs. We need a little more precision and possibly some articulation of what this doesn't cover.
We gather from reading the ISO19120 draft that there are many differing needs for addressing schemas - some for international postal addresses, some for local country postal addresses, some for identifying addresses for objects, some for identifying addresses to send things to (it appears that there is a distinction between those last two things - one distinction being an address for a house for sending an emergency vehicle to versus a post office box address for the same household for mailing purposes. Another might be the address of a fire hydrant or water pipes or power line poles, addresses for map data, and for other purposes. There seem to be clear needs for historical information or alias' to be maintained for certain purposes. Lots of differing needs. So what part of the problem is CLDR attempting to address?

Admittedly, we don't think our language has sufficient terminology to distinguish between all of the different needs for addresses of various sorts. Defining the scope will not be easy to articulate. The word "address" in itself is too broad a term since there are many types of addresses. Is this for postal addresses or for map addresses or IP addresses or object addresses, or what? If postal, Is this for international postal addresses or domestic postal addresses or both? etc. We need more precise scope.

2). The relationship to one or more existing addressing standards is outlined.
There should not be any need to invent something new.
The only standards mentioned here are the international phone standard and an Oasis addressing standard. UPU is never hinted at or any of the other multitude of international and local address standards that exist. We were particularly surprised about the UPU and local country standards being ignored.

Which existing schema or schemas is CLDR going to follow? Why this one versus others?
Are those schemas fully implemented or only partially implemented in CLDR?

3) Are there any other open source implementations of addressing standards?
If others have already invented this wheel, should we simply use that other open source implementation.

Should CLDR have some relationship to the other work similar to the CLDR relationship with the Olson TZ work group.

4). The role of the CLDR vetters is defined.
What role do the CLDR vetters have in populating this data versus the governments who are populating the UPU data or the governments that are defining local data standards within their countries or other ISO standards that being maintained independently of CLDR? This is especially of concern since this data is outside of the normal scope of the CLDR vetting process and will only be modified through CLDR bugs.

5). We have a clear understanding of how and when this data would be incorporated into ICU.
If APIs that access this data become a part of ICU, then certainly our programmers will use them as part of the "ICU accepted solution" for address formatting. This has the potential for problems down the road if CLDR's data doesn't closely align with the addressing standards that are being implemented and in some cases required by various governments.

comment:29 in reply to: ↑ 16 Changed 4 years ago by jia.shao.peng@…

Replying to jefsey@…:

Thanks for the comment!

The prefered language, script and normative national referent to be used should be included with the way they are encoded. A necessity in a global world, a prophilactic measure against unreadable spam and mail.

These could mostly be inferred from existing data in CLDR. To avoid repetition, they are not included here.

It should be possibile to add digital ecosystemic information such as (not limited): fax, e-mail, radio-frequency, social networks, telephone numbers, etc.

I agree. These are definitely some of the possible future enhancements. The current proposal is focused on address input, which is a first step to move closer to the goal of handling international contact input.

I have added your suggestion as a possible future work in the proposal:

https://docs.google.com/document/pub?id=1KL6hmksP1D-qnpyI6FpLv_C-rQBEcB_4B18SNCaN39M

comment:30 in reply to: ↑ 17 Changed 4 years ago by jia.shao.peng@…

Replying to dwayne@…:

US example seems to be incorrect. The uppercaseFields tag defines C&S as requiring capitals. Yet in the example only state is capitalised and city is not. So either this capitalisation is optional, in which case the text needs updating or the example is incorret.

Thanks for noticing this! I added the example myself to illustrate the meaning of each address field, and didn't notice this inconsistency. I have made the change in the example.

We have made our best effort to vet the metadata. In case there is any problem with the metadata we haven't spotted, it could be corrected easily in the vetting process of CLDR once it is contributed.

Thanks again for pointing this out!

comment:31 in reply to: ↑ 18 Changed 4 years ago by jia.shao.peng@…

Replying to Bruno Haible <bruno.haible@…>:

The <uppercaseFields> element for Germany is wrong. It should be empty.

Rationale: In Germany, customarily the city line is *not* uppercased. The line which is uppercased
is the country line, for international shipments. This country line is listed after the
zip code + city line, and is not even modeled in the current CLDR proposal.

If there is an error in the uppercaseFields for Germany, it is better to be corrected later in the CLDR vetting process, where many people could air their opinion and vote for them.

Regarding country, as the current metadata specifies how to layout address fields within a country, a country needs to be selected first, which is why country is not covered in the layout information. Additional data on formatting country could be added in the future. I have added this in the future plan section of the proposal.

comment:32 in reply to: ↑ 19 Changed 4 years ago by jia.shao.peng@…

Replying to kent.karlsson14@…:

Replying to jia.shao.peng@…:

Replying to kent.karlsson14@…:

There seems to be no %-letter for "country". For international papermail that would be required (along with the postal code prefix when such is used), but placed somewhat differently in different address layout schemes. For in-country papermail neither country nor "postal code prefix" (which is a country code...) is needed. The documentation should say that %Z includes the postal code prefix if there is such a prefix and the mail is not (or maybe is not) in-country. It should also say that %T, or whatever letter you choose for territory/country (C is taken for "city"/locality; note that some cities may have postal subdivisisions which are used instead of the city name), is (along with any %n just before it) left out if the mail is known to be in-country.

I have added a comment to "4. <postalCodePrefix>" to mention postal code prefix will be inserted in front of %Z when it is present.

I have also added a comment to "1. <format locale=..>" to address your comment on in-country v.s. out-country format and defining a letter for country/territory.

I find no motivation for omitting a %-letter for country, nor for omitting it from the formats. Please add a %letter for country (I'd suggest %T) and add it to the formats.

I just replied in the previous comment. I have added this as a possible future addition. Honestly, we don't have this data. If anyone has the data for that, I will be glad to see that as a follow-up proposal to add that to the metadata we are proposing now.

comment:33 in reply to: ↑ 20 Changed 4 years ago by jia.shao.peng@…

Replying to Bruno Haible <bruno.haible@…>:

The ISO/IEC 14652 draft http://www.open-std.org/jtc1/sc22/wg20/docs/n972-14652ft.pdf
(section 4.11) describes how to formalize the address fields for POSIX systems. GNU libc
implements this facility.

In contrast to the CLDR proposal, it also foresees the fields

  • Firm, Department, Building
  • Street and house number as separate fields
  • Room number
  • Floor number

Have you considered this?

Thanks for the comment! Yes we have considered this, and it could get really messy going down that road. Below is a quote provided by Shawn Steele from Microsoft who pinged his friend from the mailing industry:

<quote>
I don't think it's a good idea to go too far with this, however, and create fields for address parts, such as house number, street name, suffix etc. There are too many variables and combinations...look at Chicago's "fire route" addresses, like "123 Fire Route 2", which is the 123rd house in some mysterious path away from the fire station number 2 (this form of address was created because the Chicago fire department couldn't figure out the address of O'Leary's cow...no joke)
</quote>

In fact we have started to experiment with detailed formatting internally for a few countries. It is still not stable and comprehensive enough yet.

I have added this as a possible addition in the future plans section of the proposal.

comment:34 in reply to: ↑ 21 Changed 4 years ago by jia.shao.peng@…

Replying to bobevers@…:

There are a few things wrong for the Netherlands, first the section from the proposal:

<postalCountry iso3166="NL">
	<layout order="SmallToLarge">%O%n%N%n%A%n%Z %C</layout>
	<uppercaseFields>C</uppercaseFields>
	<requiredFields>ACZ</requiredFields>
	<postalCodeValidationRule>\d{4}[ ]?[A-Z]{2}</postalCodeValidationRule>
	<postalCodeType>postal</postalCodeType>
	<centralPostOfficeURL>www.tntpost.nl</centralPostOfficeURL> 
</postalCountry>

the space in the postalcode is (officially) mandatory and because of this space there should be 2 spaces between the postalcode and the cityname.
The space in the postalcode is in practice left out more and more, reducing the need for 2 spaces before the cityname, but if it is mentioned 2 spaces are needed.

so valid are an example:
1000 AA AMSTERDAM
1000AA AMSTERDAM
1000AA AMSTERDAM

The province name is missing, while this is normally not used, in a few cases (multiple cities with the same name) this is common (for the small city).

Thanks for pointing this out! Both issues here could be fixed in the CLDR vetting process. And I will take note of these to propose them during the survey period if this metadata is later integrated into CLDR. Thanks again for the information!

comment:35 in reply to: ↑ 22 Changed 4 years ago by jia.shao.peng@…

Replying to helenashih@…:

In the <layout-order> element, you might consider having an "Area" and "District" under the "sub-locality" notion. In Japan and Taiwan, this is commonly known as "里" (area), "區" (district)and then "市" (city). This is probably not a huge deal for commercial mailings and the time I see a strict mailing requirement on this is usually during election time when whom you might be able to vote for is directly related to this information. Sometimes it's possible the different areas of the same district within a city would have different voting ballots. For most of the governmental mailings, this is usually followed to the T even outside election cycles.

Sorry I don't quite understand. The Sub-locality in the proposal denotes the territorial unit which is smaller than "city". That could represent a district, a neighbourhood, suburb or dependent locality depending on the country. Did you mean we should create another address field which is even smaller? Under the current proposal, they will be entered in Address Lines, which can be 2-3 lines.

comment:36 in reply to: ↑ 27 Changed 4 years ago by jia.shao.peng@…

Replying to Jeroen Ruigrok van der Werven <asmodai@…>:

The postal code rule for NL should be changed from <postalCodeValidationRule>\d{4}[ ]?[A-Z]{2}</postalCodeValidationRule> to <postalCodeValidationRule>[1-9]\d{3}[ ]?[A-Z]{2}</postalCodeValidationRule> if the purpose is to not allow invalid postal codes. Our postal codes can never start with a zero (0).

Thanks for the info! This seems to be an issue with the data itself. I will take note of this to propose them during the survey period if this metadata is later integrated into CLDR.

comment:37 in reply to: ↑ 2 Changed 4 years ago by jia.shao.peng@…

copying over relevant discussions from the cldr-users mailing list. Replies are from Frank Tang (Google), who is the original author of the metadata file.

I forgot to ask, if anyone knows: Are there any existing standards that the mailing industry used for interchanging addresses?

It depend how do you define "mailing industry". If you define it as "post office of the world", the best document talking about interchanging addresses on postal mail can be found in,

http://www.upu.int/en/activities/addressing/postal-addressing-systems-in-member-countries.html

But they are talking about interchanging address on "paper" form, not electronic form.

As John mentioned, OASIS has a standard which defines all the possible address fields around the world. What it doesn't define is how the fields could be laid out together for each country, and what are the valid entries for some of the fields:

http://www.oasis-open.org/committees/ciq/Downloads/xNAL/xAL/Versions/xALv2_0/xAL.dtd

I look at xAL 5 years ago when I start the work. The problem of xAL is it has strong bias on the postal system in western country and do not really address the issue between "semantic fields" and "print on the paper". It is a standard worthy of reference to but lack of important elements.

My recollection is that, some time ago, some vendors had conventions for their software, and sometimes that was supported by other consumers, however there was still significant variation. Additionally the stuff I was aware of was very US-centric. Obviously if there is some sort of pre-existing convention, it’d be good to see what it is and how well it works.

A quick Bing shows http://www.idealliance.org/specifications/adis/faq (one example, sorry guys; in fairness I tried Google and found http://www.winswim.com/ftp/Standard%20Data%20Interchange%20Format.pdf , which is, at least, a computer interchange format)

Looks like that specification is aiming to dealing with "Interchange". When I start the work 5 years ago, our goal is not just on "interchange" in electronic format, but how to convert between what show on the envelop that a post office worker in the world know how to deal with. In the pure electronic world for "interchange" the order are mainly artificial. But for the address print on the paper the order are "cultural/country driven". And that part is the most important value we intend to contribute here. In other words, the emphasis is on the visualization/presentation according to the users's cultural from a set reasonable electronic interchangeable fields, rather how to interchange the electronic message across different organization.

comment:38 in reply to: ↑ 25 Changed 4 years ago by jia.shao.peng@…

Replying to Shawn Steele (Microsoft) <shawnste@…>:

This is being discussed on the cldr-users list as well, cc'ing here:

I want to call out existing standards with the address form CLDR proposal. I think this is a pretty big extension of what Unicode’s been doing, so I re-added unicore.

 Other standards

OASIS was mentioned, however there are apparently several additional common formats:
The USPS has an EDI format for addresses, used mainly by large institutions, to support domestic addresses. The Electronic Commerce Code Management Association (ECCMA) supports the International Address Element Code (IAEC) list on its Web site. The OASIS xNAL standard is widely referenced within e-commerce efforts such as ebXML. Internationally, the UPU S42 standard includes an element list, address templates and rendition instructions. ADIS is interoperable with all these efforts, can use UPU S42 elements, templates, and rendition instructions, and supports mail production, address management, and USPS Intelligent Mail(TM).

It seems that some of these are widely deployed, so it is probably critical to describe how the input fields map to these standards. I think that this proposal MUST speak to interoperability with the existing standards and probably why Unicode is the right place to maintain YAMS (yet another mail standard), considering that Unicode’s expertise has not historically been an expert organization WRT mailing systems.

I’m not opposed to including address data within the CLDR. Or, perhaps pointers to other standard(s) with the relevant information. However “addresses” are a huge space, and there’s already a rather large focus on this problem by other organizations, and adoption by other organizations. I think it needs to be very clear what business problem Unicode is solving by adding another address related standard and why it isn’t handled by one of the above (or some other) standards.

If there is a specific “input” problem that these groups are missing, then we should work with those groups, since they are experts, to make sure the problem is fully understood and ensure that Unicode is providing value to the mail addressing industry rather than adding to confusion or causing interoperability problems between the data input and data processing layers.

This needs far more review than the current schedule has allowed.

  • Shawn

Copying over reply from Frank Tang in the cldr-users mailing list:

I agree with you address is a big space. And I don't think this CLDR proposal intend to address all the issue in the address space. It is only try to address issues that other standard lacking. Notice the proposal is NOT proposing an "interchange format" for address, but rather, how to format/parse/validate fields for each countries/regions transmitted in some interchange formats defined by other organization.

so far, we see no other standard, in well defined machine readable specification specify

  1. Different postal code format used in each countries/regions
  2. Order of different "components" used in each countries/regions
  3. How to "format" address field into line of text for each countries/region

etc

It is true that there are many other organization define interchange format for address. However, as my knowledge, none of the define these issues I listed above. Do you aware any ?

Regards,
Frank

comment:39 in reply to: ↑ 26 Changed 4 years ago by jia.shao.peng@…

Replying to Shawn Steele (Microsoft) <shawnste@…>:

More comments from the cldr-users thread:

Trying to combine a few threads 

In general I think there’re a couple things to be solved. The intent is data entry, which must be in a recognizable form for the user because they’re the one that has to enter the data. OTOH, the data, once input, must be stored by the computer in a useful form. Those aren’t necessarily the same forms, but they certainly influence one another.

 This is supported with the postalCodePrefix field in the design. For example, a Zurich postal code could look like "CH-8001".

There’s also a country field, so it’d be nice to be able to tell that the “CH” and “Switzerland” match. For that matter, there’s a problem with “Switzerland” vs. “Schweiz” vs. whatever else, particularly if someone’s mailing locally or between countries.

As I mentioned in previous comments, the current proposal doesn't cover the country field. It is added to possible future plans.

 To my best knowledge, for international mailing, the destination country name needs to be written in the international postal language (e.g. French or English), and the other part of the address could be written in the in-country form (because it will be handled by the postman in the destination country). That being said, the metadata we are proposing here specifies the layout of address components in-country for entering addresses. To format the data entered properly, a better address formatter might need to be implemented, possibly relying on additional address metadata.

I believe there are conflicts, particularly between the USPS and others. The CJK examples would probably be an example where the country would be funny in CJK. I also (perhaps incorrectly) believe that some high-volume traffic isn’t just sent “to the receiving country”, but rather sent to a specific region. Eg: mail from London to USA might be put on a plane to the East Coast or West Coast depending on Zip Code. That requires that some sort of understanding of the receiving country’s conventions in a form the sending country can understand.

 In CJK countries, when address is written in the local script, the order of address fields is reversed. We support the two orders in our design depending on the language/script. E.g. for Taiwan:

<layout order=”LargeToSmall”>%Z%n%S%C%n%A%n%O%n%N</layout>
<layout order=”SmallToLarge”>%N%n%O%n%A%n%C, %S %Z</layout>

That helps me understand the purpose of theses, but that’s kind of my problem. I had no idea when to use which. I’m still not sure I have any idea. Also how does BIDI impact this?

When to use which is specified in root.xml and relevant locale-specific XML files.

The layout specifies the order of address fields. How could Bidi impact that? ;-) If you are thinking about the texts that will go into each address field, then whether to enable bidi for the input field itself could be inferred from existing data in CLDR depending on the country and language.


 Yes, personal names are very important, which is why we don't want to get it wrong. As this moment, the name field in this design is a blank field, which users could enter anything. By saying formatting of names is out of scope in the design doc, we meant we will output name exactly as it is entered in the name field. No metadata support is included to help format names as nick names or forms like "Mr. xxx".

That makes sense, but is it extensible later? If I start getting a bunch of underspecified input and want to be smarter about it later, then it’s going to be a problem to parse the names later.

I could understand your concern. However, the real question is can we be smarter about names without messing it up? :)

 Similarly, department within an organization can have a big meaning. Ignoring someone’s department could be offensive in some cases.
 This could be entered into the organization field. No validation is done for this field.

Organization and department often have to be on different lines, so I think this needs more thought.

 This (carrier route) seems to be only applicable to US, and it is optionally used by some businesses to do saturation mailing campaigns to get discount on mailing as you mentioned. When they do that, they are unlikely to use ZIP+4 code as all, and the 9-digit carrier route information could possibly be stored in the ZIP field (it is serving the same purpose anyway), as it might not be optimal to create a separate field for carrier route info, which will only be used for some use-cases in one country only.

Actually, I think you had a similar example for France using the sorting code. CRRT sorted stuff REQUIRES zip+4, the field cannot be reused. It also goes in a special place when printed. X might work though, but this shows some of the missing complexity. Granted carrier routes would never be input by hand, however I think that the spec needs to consider this field.

The carrier route code in US could be covered as %X: sorting code, as CEDEX in France.

However, as the carrier route code in US is rarely used for address input, it is not included in the layout for US. If you look at the layout for France, it includes %X.

 C/O John Smith

This is like department. It “could” be part of the 1st line, however the USPS and Zip+4/carrier route sorting software in the US is very strict about the format of the “lines” of the address, and that’s decidedly NOT part of the address. It might be better to treat this as a second line of the recipient. (That might work with organization/department as well). Brainstorming, you could have:

Recipient
Organization
Address

Where recipient could be:

Shawn Steele
c/o Someone Else

And organization could be:
Microsoft Corporation
Windows

And address could be

1 Microsoft Way

I’m not sure how well that works with the USPS form, but it’s worth thinking about. That could also handle the “legal garbage” that James mentioned.

-Shawn

comment:40 in reply to: ↑ 28 Changed 4 years ago by jia.shao.peng@…

Thanks for the comments, John! I will try to answer the first 3 of your questions here.

Replying to emmons:

IBM's position on the subject PRI is as follows:

IBM cannot support this proposal until the following has been satisfied:

  1. The scope of what this addressing schema is supposed to cover is defined.

I have updated the proposal to make the scope clearer. As stated in the beginning of the proposal now, the metadata we are proposing to be included in CLDR is:

"a format for layout of address entry fields in a UI for users to enter in address information"

It is NOT:

  • an interchange format for address format
  • a format for formatting postal labels
  • a standard for address

It says "This metadata is intended for presenting a form for users to fill in with address data." This seem to be a pretty limited definition since address data can cover a wide variety of needs. We need a little more precision and possibly some articulation of what this doesn't cover.
We gather from reading the ISO19120 draft that there are many differing needs for addressing schemas - some for international postal addresses, some for local country postal addresses, some for identifying addresses for objects, some for identifying addresses to send things to (it appears that there is a distinction between those last two things - one distinction being an address for a house for sending an emergency vehicle to versus a post office box address for the same household for mailing purposes. Another might be the address of a fire hydrant or water pipes or power line poles, addresses for map data, and for other purposes. There seem to be clear needs for historical information or alias' to be maintained for certain purposes. Lots of differing needs. So what part of the problem is CLDR attempting to address?

Admittedly, we don't think our language has sufficient terminology to distinguish between all of the different needs for addresses of various sorts. Defining the scope will not be easy to articulate. The word "address" in itself is too broad a term since there are many types of addresses. Is this for postal addresses or for map addresses or IP addresses or object addresses, or what? If postal, Is this for international postal addresses or domestic postal addresses or both? etc. We need more precise scope.

2). The relationship to one or more existing addressing standards is outlined.
There should not be any need to invent something new.
The only standards mentioned here are the international phone standard and an Oasis addressing standard. UPU is never hinted at or any of the other multitude of international and local address standards that exist. We were particularly surprised about the UPU and local country standards being ignored.

Which existing schema or schemas is CLDR going to follow? Why this one versus others?
Are those schemas fully implemented or only partially implemented in CLDR?

As this is not a proposal for a standard of address, nor an interchange format for address format, there is no need to provide a detailed mapping to each of the address standard in this proposal. The proposal is on layout of address entry field, and the meaning of each entry field is clearly documented with the goal of making it easier to map to existing address standard if needed. If any entry field is not well documented, please let us know and we could improve that.

3) Are there any other open source implementations of addressing standards?
If others have already invented this wheel, should we simply use that other open source implementation.

This proposal is NOT proposing an address standard, given that many have been in existence and there is no need for one more. What we are proposing in this proposal is about how different address fields should be laid out for all the countries in the world, which doesn't yet exist as far as we know.

Should CLDR have some relationship to the other work similar to the CLDR relationship with the Olson TZ work group.

Please let us know if you know any existing work that contains the information I have just described above.

4). The role of the CLDR vetters is defined.
What role do the CLDR vetters have in populating this data versus the governments who are populating the UPU data or the governments that are defining local data standards within their countries or other ISO standards that being maintained independently of CLDR? This is especially of concern since this data is outside of the normal scope of the CLDR vetting process and will only be modified through CLDR bugs.

5). We have a clear understanding of how and when this data would be incorporated into ICU.
If APIs that access this data become a part of ICU, then certainly our programmers will use them as part of the "ICU accepted solution" for address formatting. This has the potential for problems down the road if CLDR's data doesn't closely align with the addressing standards that are being implemented and in some cases required by various governments.

comment:41 Changed 4 years ago by jorgen.wyke@…

I am the secretary of the Swedish technical committee on Character representation and user interfaces, SIS/TK 445, which is a Unicode liaison member.

I am also involved with ISO/TC 211 Geographic information. TC 211 have a work item on Addressing - WI 19160. This comment is given in understanding with the TC secretariat and the ISO 19160 project leader Ms Serena Coetzee.

The two tasks of ISO 19160 have been to investigate and formulate requirements in relation to addressing and to make recommendations on whether standards should be developed and if so, how this should be done.

The team has published a report on this which I think would be of interest to your intended work. I have been given permission to send it to you for use in this work. Could someone please e-mail me so I can send it?

/Best regards, Jörgen Wyke

SIS, Swedish Standards Institute

comment:42 Changed 4 years ago by jorgen.wyke@…

I have just received a URL where the review summary ISO 19160 on Addressing standards is available: http://www.isotc211.org/address/iso19160.htm

comment:43 Changed 3 years ago by mark

  • Milestone changed from UNSCH to future

comment:44 Changed 3 years ago by mark

  • Owner changed from umesh to jchye

comment:45 Changed 3 years ago by info@…

<postalCountry iso3166="DE">
<layout order="SmallToLarge">%N%n%O%n%A%n%Z %C</layout>

In Germany is the preferred form: %O%n%N%n%A%n%Z %C

Best regards,

Hasan H. Gürsoy (HHGAG)

comment:46 Changed 3 years ago by info@…

<uppercaseFields>C</uppercaseFields>

is also unneeded in Germany.

Best regards,
Hasan H. Gürsoy (HHGAG)

comment:47 Changed 2 years ago by fredrik

We had a request internally for this feature just today. Maybe time to put back on deck?

comment:48 Changed 17 months ago by emmons

  • Milestone changed from future to UNSCH

Merging future and UNSCH

comment:49 Changed 15 months ago by uchuugaka@…

This would be an awesome feature, but should include optional guidance or validation support for fields. Many countries' conventions may require specific characters be used by regulation. Japanese for example may have some requirement to use Full Width Romanized Numbers for post code numbers. This would be a great piece to add to pre localized values for form guidance. Some countries may not utilize it, but in locales where multiple scripts are in play potentially at the same time, it becomes quite customary.

Field order should also be distinguished in the data by display language, not only country. So for example, in many Asian countries it is common to have "English" or Romanized addresses globally, where the field order may follow more North American or European format of small to large designations, while, the field order in a native script may be quite the reverse from large designations to more granular small designations.

comment:50 Changed 2 months ago by markus

  • Type changed from enhancement to data

comment:51 Changed 2 months ago by srl

  • Status changed from assigned to accepted

comment:52 Changed 7 weeks ago by markus

  • version 2.0 deleted
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.