Re: OS 99

From: schererm@us.ibm.com
Date: Thu Nov 12 1998 - 09:26:46 EST


About classifying operating system support for Unicode -

how about having more specific attributes for "support of Unicode"?

For example:

- API functions taking or returning characters and strings
  - provide UTF-8 locale version for MBCS-type strings,
    or other mechanism to select UTF-8 as MBCS-format
  - accept strings in 16b UCS-2, ignoring surrogate pairs
  - accept strings in 16b UTF-16, supporting surrogate pairs
    (example: go forward one character may be forward 2 words,
     count characters in string, is character digit, etc.)
  - accept strings in 32b UCS-4
  - functions for Unicode-based cultural sorting, upper-/lowercasing, etc.
    - limited to the BMP (UCS-2)
    - limited to planes 0-16 (UTF-16)
    - all UCS (UCS-4, UTF-8)

- conversion APIs
  - convert from/to MBCS codepages to/from UCS-2 but not UTF-16
  - convert from/to MBCS codepages to/from UTF-16
  - convert from/to MBCS codepages to/from UCS-4
  - convert from/to UTF-8 to/from UCS-2 but not UTF-16
  - convert from/to UTF-8 to/from UTF-16
  - convert from/to UTF-8 to/from UCS-4
  - encode/decode UTF-16 into/from SCSU

- file names, names for semaphores, pipes, etc.
  - can contain any BMP characters
  - can contain any characters on planes 0-16 (UTF-16 range)
  - can contain any UCS characters (UCS-4, UTF-8)

- UI system
  - integrated into base system or tightly integrated class library
    - accepts, displays, and allows entering characters from
      - the BMP
      - planes 0-16 (UTF-16 range)
      - all UCS (UCS-4, UTF-8)
    - has a "complete" font
    - has layout engines for which language groups?
    - can display characters that it does not have a
      layout engine for in some rudimentary manner
    - can display distinct codes for unknown characters/unassigned
      code points (e.g. hex digits in character cell)
  - uses (possibly) remote terminals
    - passes characters transparently to
      accept, display, and allow entering characters from
      - the BMP
      - planes 0-16 (UTF-16 range)
      - all UCS (UCS-4, UTF-8)
    - there is a terminal with (some of) the
      above font and rendering capabilities

...and there is certainly more that is important for developers...

hwsat?

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
scherer@raleigh.ibm.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT