L2/02-380
JTC1/SC22
N3465
From:ISO/IEC JTC 1/SC22
Programming languages, their environments and system software interfaces
Secretariat: U.S.A. (ANSI)
ISO/IEC JTC 1/SC22 N3465
TITLE:
SC 22/WG 4 Convenor Contribution to the Coded Character Sets Workshop, 26
August 2002, Saariselkä, Finland
DATE ASSIGNED:
2002-08-12
SOURCE:
SC 22/WG 4 Convenor (A. Bennett)
BACKWARD POINTER:
DOCUMENT TYPE:
Other document (Open)
PROJECT NUMBER:
N/A
STATUS:
This contribution will be reviewed at the Character Sets ad hoc.
ACTION IDENTIFIER:
FYI
DUE DATE:
N/A
DISTRIBUTION:
Text
CROSS REFERENCE:
N/A
DISTRIBUTION FORM:
Open
Address reply to:
ISO/IEC JTC 1/SC22 Secretariat
Matt Deane
ANSI
25 West 43rd Street
New York, NY 10036
Telephone: (212) 642-4992
Fax: (212) 840-2298
Email: mdeane@ansi.org
____end of cover page, beginning of document__________
August 9, 2002
To: SC22 character set ad hoc
From: Ann Bennett, Convener, ISO/IEC JTC 1/SC 22/WG4 - COBOL
Subject: Large character set and cultural adaptability support in
COBOL
WG4 has recently completed ISO/IEC FDIS 1989, which includes support for
cultural adaptability and large character sets (typically ISO/IEC 10646, but
not mandated). The support is summarized below. WG4 had significant help
from WG20 in developing this support, particularly in the early planning
stage when I attended WG20 meetings. WG4 is now starting to plan for future
work and will be considering the following:
(1) Handling of surrogate pairs of UTF-16 as a character unit (currently
the unit of processing for UTF-16 is a 2-octet code).
(2) Handling of combining sequences. WG4 needs input on the industry
direction.
(3) Additional date and time formatting. Current support is minimal.
WG4 needs input on the requirements.
(4) Additional extended letters for identifiers, to accord with additions
to ISO/IEC 10646. WG4 would expect to adopt additional letters in a future
revision or amendment of COBOL.
WG4 needs stable normative references for the repertoire of extended letters
in identifiers and the associated case foldings. Both need to be provided
in normative references that ISO/IEC accepts.
WG4 is looking to the character set ad hoc and WG20 for continued support in
understanding character set and cultural adaptability requirements.
Summary of character set and cultural adaptability support in ISO/IEC FDIS
1989:2002:
- The COBOL FDIS adds a character data type for large character sets (such
as ISO/IEC 10646), but does not mandate any particular representation. In
COBOL terms, a USAGE NATIONAL clause has been added to data description
entries. Operations are performed on fixed-size units (called encoding
units in some implementations) with no recognition of surrogate pairs or
combining sequences. Character set conversion is provided by intrinsic
functions and features for conversion on input/output of file records.
- An intrinsic function is provided for comparisons using an implementation
of ISO/IEC 14651. Users can identify a table and specify a comparison
level.
- Cultural adaptability support lets the user choose a "locale" for
sorting, comparisons, monetary format, upper and lower casing, and date and
time formatting. The Posix locale is used for specification purposes, but a
Posix implementation is not required. Users can select each category of
cultural conventions independently of the others. For example, users might
sort with one "locale" and use monetary format from another.
- Extended identifiers are provided using the repertoire of TR 10176:2001,
with the addition of Catalan middle dot. Case foldings are in accordance
with the tolower specification of DTR 14652, with overrides for Greek sigma
(<U03C2>,<U03C3>) and Turkish small dotless i (<U0131>,<U0069>). The
tables were copied into the COBOL FDIS.