Karl Williamson <public_at_khwilliamson.com> wrote:
|On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
|> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
|>|2013/11/5 Steffen Daode <sdaoden_at_gmail.com>
|>|> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
|>|> for some properties from PropList.txt, say, _PRINT can't be set
|>|> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
|>|
|>|TAB is "printable" (for the isprint() macro in standard \
|>|C librries) because
|>
|> Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):
|
|The only vendor I'm aware of that makes TAB a printable is Microsoft.
|Thus Philippe is wrong about this except for MS products.
That made me curious, and it doesn't seem to be right [1].
isprint returns a nonzero value if c is a printable character—this
includes the space character (0x20 – 0x7E).
The behavior of isprint and _isprint_l is undefined if c is not
EOF or in the range 0 through 0xFF, inclusive. When a debug CRT
library is used and c is not one of these values, the functions
raise an assertion.
[1] <http://msdn.microsoft.com/en-us/library/ewx8s4kw(v=vs.110).aspx>
Well, i hope this is not a crashing assertion but only a loud log
entry... (Having no idea of M$, but completely separating debug
and shipout code i ever used, too.)
|under MS except the C locale. (MS also has other Posix violations, such
|as having isdigit() match superscript numbers.)
isdigit() is silent ([2]),
isdigit returns a nonzero value if c is a decimal digit (0 – 9).
iswdigit returns a nonzero value if c is a wide character that
corresponds to a decimal-digit character.
[2] <http://msdn.microsoft.com/en-us/library/fcc4ksh8(v=vs.110).aspx>
but going to the equivalent etc. leads to "Character
Classification" [3]
[3] <http://msdn.microsoft.com/en-us/library/t9zea13t(v=vs.110).aspx>
and finally "Char.IsDigit Method (Char)" [4], where i've found:
This method determines whether a Char is a radix-10 digit. This
contrasts with IsNumber, which determines whether a Char is of any
numeric Unicode category. Numbers include characters such as
fractions, subscripts, superscripts, Roman numerals, currency
numerators, encircled numbers, and script-specific digits.
Valid digits are members of the UnicodeCategory.DecimalDigitNumber
category.
[4] <http://msdn.microsoft.com/en-us/library/7f0ddtxh(v=vs.110).aspx>
So, whew!, Microsoft seems to get the carefully designed isXY(3)
series right. But i also came across _pipe(), and there you go.
--steffen
attached mail follows:
On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
> |2013/11/5 Steffen Daode <sdaoden_at_gmail.com>
> |> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
> |> for some properties from PropList.txt, say, _PRINT can't be set
> |> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
> |
> |TAB is "printable" (for the isprint() macro in standard C librries) because
> |it has a whitespace property, even if its general category is very weakly
>
> Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):
>
> print
> Define characters to be classified as printable characters,
> including the <space>.
>
> In the POSIX locale, all characters in class graph shall be
> included; no characters in class cntrl shall be included.
>
> In a locale definition file, characters specified for the
> keywords upper, lower, alpha, digit, xdigit, punct, graph, and
> the <space> are automatically included in this class. No
> character specified for the keyword cntrl shall be specified.
>
> [1] <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03_01>
>
> Verifieable under LC_ALL=en_GB.UTF-8 in Mac OS X Snow Leopard
> (which admittedly uses very old Citrus data, i always wonder why all
> those Gigabytes of «Software Update»s don't tweak that, not to
> talk about GNU make 3.81 and all the other buggy or non-compliant
> stuff, but that is a different story):
>
> #include <stdio.h>
> #include <ctype.h>
> #include <wctype.h>
> int main(void) {
> printf("%d %d\n",isprint('\t'), wcwidth(L'\t'));
> return 0;
> }
>
> ?0[steffen_at_sherwood tmp]$ cc -o zt t.c && ./zt
> 0 -1
>
> |The character mapping for the isprint() macro is defined by an expression
> |based on existing Unicode properties. Most C libraries optimize this
>
> But i agree that POSIX has to move towards Unicode definitions,
> and more byte- than bitwise.
>
> --steffen
>
The only vendor I'm aware of that makes TAB a printable is Microsoft.
Thus Philippe is wrong about this except for MS products.
MS makes TAB also a control, violating the Posix standard by having it
be both printable and a control. This is true in all locales I've seen
under MS except the C locale. (MS also has other Posix violations, such
as having isdigit() match superscript numbers.)
Received on Thu Nov 07 2013 - 07:00:40 CST
This archive was generated by hypermail 2.2.0 : Thu Nov 07 2013 - 07:00:41 CST