Concise term for non-ASCII Unicode characters
lists+unicode at seantek.com
Tue Sep 22 04:27:36 CDT 2015
On 9/21/2015 9:24 PM, Janusz S. Bien wrote:
> Quote/Cytat - Sean Leonard <lists+unicode at seantek.com> (Mon 21 Sep
> 2015 10:51:42 PM CEST):
>> Related question as I am researching this:
>> How can I acquire (cheaply or free) the latest and most official copy
>> of US-ASCII, namely, the version that Unicode references?
Thanks to all. I was able to locate a copy of ANSI X3.4-1986 (R1997)
[hereinafter ASCII]. (See my subsequent e-mail about the term "ASCII".)
> I've never seen the ASCII standard, but I think is it (almost?)
> identical to ISO/IEC 646, which in turn is identical to the freely
> available ECMA-6:
Having just read both standards documents in some detail, I can attest
that they are not the same. However, the practical effect for purposes
of Unicode is the same.
ECMA-6 (1991) is indeed identical to ISO/IEC 646 (as far as I can tell;
hereinafter ECMA-6). ECMA-6 "specifies a 7-bit coded character set with
a number of options" (Clause 1.2). Specifically, the following positions
are ambiguous or subject to national assignment:
2/3 NUMBER SIGN or POUND SIGN
2/4 DOLLAR SIGN or CURRENCY SIGN
ECMA-6 specifies an International Reference Version (IRV), which
exercises the "options". The IRV fills in the graphic characters
consistent with ASCII. However, ECMA-6 sort of leaves the C0 region
blank...and the IRV (in Annex A, normative) says "if the C0 set [...] is
used, it shall be the C0 set of Standard ECMA-48." Sort of fudging.
Anyway, the IRV C0 set / ECMA-48 set is the same as ASCII.
Overall, the takeaway is that specifying ISO/IEC 646 / ECMA-6 is not
sufficient; you need to include "IRV" as well, or ISO IR No. 6 for the
G0 set and ISO IR No. 6 for the C0 set.
In contrast, if you say ASCII (ANSI X3.4-1986), all positions are fully
More information about the Unicode