FW: Plane-2-only string i18nguy supplementary-test page

From: Tex via Unicode <unicode_at_unicode.org>
Date: Mon, 13 Nov 2017 21:45:53 -0800

 

I am the author of the supplementary-test page on i18nguy.com.

 

The method for choosing the characters is described on the page, so isn’t a mystery. See below.

I do not believe any of the characters are offensive, although context matters greatly and languages evolve, so it is possible that a character can gain an offensive meaning or usage at any time.

Consider the humble eggplant…

 

The page was created to offer values for testing supplementary characters with values that would justify fixing any problems they uncover.

The values are probably not the best choice for demonstrating and marketing fonts, the usage Peter is looking for.

 

Here is an excerpt from the page:

 

In 2005, the IRG (Ideographic Rapporteur Group) <http://www.cse.cuhk.edu.hk/~irg/index.htm> identified a set ideographs, called the Ideographic International Core (IICore) <http://appsrv.cse.cuhk.edu.hk/~irg/irg/IICore/IICore.htm> . The 10,000 ideographs in the IICore are the most frequently used characters that would cover the vast majority of modern texts in all locales where ideographs are used. This collection is intended for use in devices with limited resources, such as mobile phones.

Test Characters

To have characters that are good for testing software support for the Supplementary Plane, I extracted the 62 characters from the IICORE that are in the Supplementary Plane. These characters have the properties that:

· Being in IICORE they are used frequently enough to be a minimum requirement for software supporting ideographs

· They are in the Supplementary Plane and will test support for code points above U+FFFF

· They are not "oddball" values. If using them uncovers a problem, fixing the problem is inherently justified.

 

 

Tex

 

 

From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Philippe Verdy via Unicode
Sent: Monday, November 13, 2017 12:58 PM
To: James Kass
Cc: Peter Constable; Unicode list
Subject: Re: Plane-2-only string

 

 

2017-11-13 21:48 GMT+01:00 James Kass <jameskasskrv_at_gmail.com>:

Peter Constable wrote,

>> May be this test page ?
>>
>> http://www.i18nguy.com/unicode/supplementary-test.html
>
> Thanks. I’d need to know _at least something_ about what the characters
> signify, though, to have a sense of whether there’s anything potentially
> offensive.

The Plane 2 characters on that page appear to be random.

 

That's probable but the authors claim these are common characters. It's possible they collected statistics from some corpus to find some of the most widely used characters in Plane 2, without needing to understand what they would mean if they are put side by side (I had noted already that there was no punctuation at all, and the exposed collection is too long for a typical Chinese text, and in fact I would expect the presence of some CJK punctuations.

May be we could compile a list of Chinese toponyms using these, and select those that use more than one Plane2 character, then separate these names using CJK commas and a final CJK full stop.

 

Some Wikidata or OSM data search could be used to compile such list (I think these topynyms will more likely be found in Cantonese, or Taiwanese related sources, using the zh-Hant variant, but note that Wikidata does not distinguish zh-Hans and zh-Hant as Wikimedia wikis use a transliterator, but I doubt this transliterator performs transforms with Plane2 characters which should remain unchanged with most of them kept for both traditional and simplified use).
Received on Mon Nov 13 2017 - 23:46:38 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 13 2017 - 23:46:39 CST