From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Aug 19 2003 - 08:57:25 EDT
Compatibility characters:
The recommendations for compatibility characters are necessarily vague, 
since their use in legacy data (and legacy environments) is strongly 
dependent on what is (or was) customary in a given environment.
If a process merely warehouses text data (or parses only a very small 
subset of characters for special purpose, such as an HTML parser) then 
merely preserving legacy characters is often the best strategy. However, 
take the opposite example, of a process that actually scans the text for 
roman numerals. In that case, ignoring the compatibility characters would 
be a mistake, since legacy data of the kind for which these compatibility 
characters were added would *only* contain roman numerals in this form. 
They would *not* use the ASCII characters.
Processes that modify legacy data for re-export to a legacy system 
obviously need to be intimately familiar with the legacy conventions, in a 
way that could not possibly be documented in the Unicode Standard in all 
details for every character and every legacy system.
Documentation in the code charts:
I agree with several of the comments that "hiding" the information about 
special characters in running text makes it unnecessarily difficult to work 
with the information. On the other hand, not everything can be succinctly 
expressed in machine readable tables (some characters have complicated 
usages), and even annotations in the name list have limits. They are 
definitely not the place for lengthier discussions.
For Unicode 4.0 we have attempted to improve the situation by systematically
extracting the line-breaking related information into UAX#14, which at 
least allows task-focused access. Information about mathematical usage of 
characters is now collected in one place in UTR#25, partially duplicating, 
and partially extending the information in the text of the standard, but 
providing a single place of access. Further improvements are possible. 
Personally I'd be in favor of some icon in the character names list that 
simply indicates that a character is more fully discussed elsewhere - that 
would make the code charts more useful as an index into the description of 
the characters.
Mathematical operators:
Future extensions of programming languages should allow not only the MINUS 
sign as operator, but many other charactesr, for example LOGICAL AND and 
LOGICAL OR, and as many other operators as appropriate for the language.
Input of the operators doesn't have to necessarily be done via a special 
purpose keyboard. The use of input macros, editor substitution or similar 
input technologies (e.g. turning && into LOGICAL AND) would be more 
flexible. Some editors already support the display of highly formatted 
program source code even though the underlying text backbone uses the 
standard ASCII conventions of current programming languages. Just one 
example is Source Insight from www.sourceinsight.com, which not only 
represents >= etc. by singly symbols, but can also correctly increase the 
size of outer parentheses for nested expressions.
A./
This archive was generated by hypermail 2.1.5 : Tue Aug 19 2003 - 09:34:07 EDT