ISO/IEC JTC1/SC2/WG2 N2235
Date: 2000-08-09
Title: |
Proposal for addition of ZERO WIDTH WORD JOINER |
Source: |
Unicode Technical Committee |
Status: |
Liaison Communication |
Action: |
For consideration by JTC1/SC2/WG2 |
The codepoint U+FEFF serves two very different purposes.
It is clear in retrospect that this was a grave mistake. If U+FEFF only had the semantic of a signature codepoint, it could be freely deleted from text without affecting the interpretation of the rest of the text. Appending files together, for example, can result in a signature codepoint in the middle of text. Unfortunately, U+FEFF does also have significance as a character. As a ZWNBSP, it indicates that line breaks are not allowed between the adjoining characters. Thus U+FEFF does impact the interpretation of text, and cannot be freely deleted. The overloading of semantics for this codepoint has caused innumerable problems for programs, not in the least in terms of overall comprehensibility of Unicode/10646.
To ameliorate this situation, the UTC has approved the addition of a new character at U+2060, ZERO WIDTH WORD JOINER. This character would have the same semantics in all cases as U+FEFF, except that it cannot be used as a signature. The goal is to move implementations to use this new character over the next few years, discouraging the use of U+FEFF as ZWNBSP. At some point in time, the use of U+FEFF as a ZWNBSP can be deprecated, thus preserving only the use as a signature. This will simplify the programming model for Unicode/10646 significantly, and decrease the opportunity for error in countless implementations. The character should be encoded in the BMP, since it is similar to other characters there.
The UTC urges WG2 to also approve this character for addition to ISO 10646.
For instructions and guidance for filling in the form please see the document " Principles and Procedures for Allocation of New Characters and Scripts" (http://www.dkuug.dk/JTC1/SC2/WG2/prot)
1. Title: ZERO WIDTH WORD JOINER
2. Requester's name: Unicode Technical Committee
3. Requester type (Member body/Liaison/Individual contribution): Liaison
4. Submission date: 2000-08-10
5. Requester's reference (if applicable):
6. (Choose one of the following:) This is a complete
proposal
This is a complete proposal: ; or,
More information will be provided later:
1. (Choose one of the following:)
a. This proposal is for a new script (set of characters): No
Proposed name of script:
b. The proposal is for addition of character(s) to an existing
block: Yes
Name of the existing block: 2000; 206F; General Punctuation
2. Number of characters in proposal: One
3. Proposed category (see section II, Character Categories): Alternate Format Character (as with ZWJ)
4. Proposed Level of Implementation (see clause 15, ISO/IEC
10646-1): Any level is acceptable
Is a rationale provided for the choice? N/A
If Yes, reference:
5. Is a repertoire including character names provided?: Yes
a. If YES, are the names in accordance with the 'character
naming guidelines' in Annex K of ISO/IEC 10646-1? Yes
b. Are the character shapes attached in a reviewable form? N/A
6. Who will provide the appropriate computerized font (ordered
preference: True Type, PostScript or 96x96 bit-mapped format) for publishing the
standard? The Unicode Technical Committee
If available now, identify source(s) for the font (include address, e-mail,
ftp-site, etc.) and indicate the tools used:
7. References:
a. Are references (to other character sets, dictionaries, descriptive texts
etc.) provided? N/A
b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? N/A
8. Special encoding issues:
Does the proposal address other aspects of character data
processing (if applicable) such as input, presentation, sorting, searching,
indexing, transliteration etc. (if yes please enclose information): Yes, see ISO/IEC
JTC1/SC2/WG2 N2235
1. Has this proposal for addition of character(s) been submitted before? No
If YES explain
2. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? Yes
If YES, with whom? Unicode member companies (see http://www.unicode.org/unicode/consortium/memblogo.html)
If YES, available relevant documents?
3. Information on the user community for the proposed
characters (for example: size, demographics, information technology use, or
publishing use) is included? major IT industry leaders
Reference:
4. The context of use for the proposed characters (type of
use; common or rare) YES
Reference: see ISO/IEC JTC1/SC2/WG2 N2235
5. Are the proposed characters in current use by the user
community? N/A
If YES, where? Reference:
6. After giving due considerations to the principles in N 1352
must the proposed characters be entirely in the BMP? Yes
If YES, is a rationale provided? Yes
If YES, reference: Yes, see ISO/IEC JTC1/SC2/WG2 N2235
7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? N/A
8. Can any of the proposed characters be considered a
presentation form of an existing character or character sequence? No
If YES, is a rationale for its inclusion provided?
If YES, reference:
9. Can any of the proposed character(s) be considered to be
similar (in appearance or function) to an existing character? No
If YES, is a rationale for its inclusion provided?
If YES, reference:
10. Does the proposal include use of combining characters
and/or use of composite sequences (see clause 4.11 and
4.13 in ISO/IEC 10646-1)? No
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images (graphic
symbols) provided? No
If YES, reference:
11. Does the proposal contain characters with any special
properties such as control function or similar semantics? Yes
If YES, describe in detail (include attachment if necessary) see ISO/IEC
JTC1/SC2/WG2 N2235
1. Relevant SC 2/WG 2 document numbers:
2. Status (list of meeting number and corresponding action or
disposition):
3. Additional contact to user communities, liaison
organizations etc:
4. Assigned category and assigned priority/time frame: