Re: CLDR from James Kass via Unicode on 2018-09-04 (Unicode Mail List Archive)

From: James Kass via Unicode <unicode_at_unicode.org>
Date: Tue, 4 Sep 2018 01:02:50 -0800

(This is the response from Janusz S. Bień which was sent to the public list.)

On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote:

> Janusz S. Bień wrote,
>
>> Thanks for the link. I found especially interesting the Polish section
>> in
>>
>> https://www.unicode.org/cldr/charts/34/subdivisionNames/other_indo_european.html
>>
>> Looks like a complete rubbish, e.g.
>>
>> plmp = Federal Capital Territory(???) = Pomerania (Latin/English name of
>> Pomorze) transliterated into the Greek alphabet (and something in
>> Arabic).
>
> And nothing in Armenian, Albanian, or Pashto.
>
> If you click on the link at "plpm", it takes you right back to that
> same entry on that same page, which doesn't seem very helpful.
>
>> The header of the page says "The coverage depends on the availability of
>> data in wikidata for these names" but I was unable to find this rubbish
>> in Wikidata (but I was not looking very hard).
>
> I tried both "plpm" and "Πομερανία" in the Wikidata search box. On
> the latter, there were some pages which looked to translate place
> names into various languages, for both Germany and Poland. I couldn't
> find the exact page, but it would be something like this page:
>
> https://www.wikidata.org/wiki/Q54180
>
> (Clicking "All Entered Languages" on that page gives a lengthy list.)

Thanks! Most data about Poland at

https://www.wikidata.org/wiki/Q36

seem to make sense, but I don't think anybody is using abbreviation like
"plpm" (for Pomorze/Pomerania).

>
>>>> > and we really
>>>> > need to go through the data and correct the many many errors, please.
>>
>> But who is the right person or institution to do it?
>
> If the CLDR information is driven by Wikidata as the file header
> indicates, then Wikidata.

I hope not all CLDR data are driven by Wikidata...

On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:

> On 03/09/18 09:53 Janusz S. Bień via Unicode wrote:
[...]
>> > These comments are designed for the Code Charts and as such must not be
>> > disproportionate in exhaustivity. Eg we have lists of related languages ending
>> > in an ellipsis.
>>
>> Looks like we have different comments in mind.
>
> Then I’m sorry to be off-topic.

Let's say off the original topic. My primary concern is to preserve
somehow such comments as e.g. the one on the bottom of page 14 of

https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf

>
> […]
>> >> > and we really
>> >> > need to go through the data and correct the many many errors, please.
>>
>> But who is the right person or institution to do it?
>
> Software vendors are committed to care for the data, and may delegate survey
> to service providers specialized in localization. Then I think that public language
> offices should be among the reviewers. Beyond, and especially by lack of the
> latter, anybody is welcome to contribute as a guest. (Guest votes are 1 and don’t
> add one to another.) That is consistent with the fact that Unicode relies on
> volunteers, too.
>
> I’m volunteering to personally welcome you to contribute to CLDR.

Thanks. The interesting question is who is/was already contributing from
Poland or about Polish language. I vaguely remember a post with this
information, but at that time I was not interested enough to take a
note.

>
> […]
>> > Further you will see that while Polish is using apostrophe
>> > https://slowodnia.tumblr.com/post/136492530255/the-use-of-apostrophe-in-polish
>> > CLDR does not have the correct apostrophe for Polish, as opposed eg to French.
>>
>> I understand that by "the correct apostrophe" you mean U+2019 RIGHT
>> SINGLE QUOTATION MARK.
>
> Yes.
>
>>
>> > You may wish to note that from now on, both U+0027 APOSTROPHE and
>> > U+0022 QUOTATION MARK are ruled out in almost all locales, given the
>> > preferred characters in publishing are U+2019 and, for Polish, the U+201E and
>> > U+201D that are already found in CLDR pl.
[...]
> It’s a bit confusing because there is a column for English and a column for Polish.
> The characters you retrieved are actually in the English column, while Polish has
> consistently with By-Type, these quotation marks:
> ' " ” „ « »
> Hence the set is incomplete.

You are right, thanks. But was is the practical importance of it?
I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
text with unusual characters, but had no motivation to investigate how
this is related to the current locale.

>>
>> >
>> > Note however that according to the information provided by English Wikipedia:
>> > https://en.wikipedia.org/wiki/Quotation_mark#Polish
>> > Polish also uses single quotes, that by contrast are still missing in CLDR.
>>
>> You are right, but who cares? Looks like this has no practical
>> importance. Nobody complains about the wrong use of quotation marks in
>> Polish by Word or OpenOffice, so looks like the software doesn't use
>> this information. So this is rather a matter of aesthetics...
>
> I’ve come to the position that to let a word processor “use” quotation marks
> is to miss the point. Quotation marks are definitely used by the user typing
> in his or her text, and are expected to be on the keyboard layout he or she
> is using. So-called smart quotes guessed algorithmically from ASCII simple
> and double quote are but a hazardous workaround when not installing the
> appropriate keyboard layout. At least that is my position :)

The standard keyboard has a limiting number of keys, so you have to make
compromises. It is generally accepted that Polish keyboard layouts
(there are primarily two of them) does not contain apostrophe or single
quotations marks. There is a proposal by Marcin Woliński

http://marcinwolinski.pl/keyboard/

which is available in most Linux distributions but it does not seem
popular.
Received on Tue Sep 04 2018 - 04:03:21 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 04 2018 - 04:03:22 CDT