L2/01-054 From: Mark Davis [mark.davis@us.ibm.com] Sent: Tuesday, January 23, 2001 8:21 PM Subject: Agenda Item: Ranges in UnicodeData At the last UTC, we decided to change the format for files such as Blocks.txt so that any ranges would be expressed with "..". Thus we have the format in the new Blocks.txt (currently http://www.unicode.org/Public/3.1-Update/Blocks-4d3.beta.txt) # Start Code..End Code; Block Name 0000..007F; Basic Latin 0080..00FF; Latin-1 Supplement 0100..017F; Latin Extended-A ... This simplifies parsing and unifies our notation. When the first field in any of our files is parsed, ".." always indicates that there is a range. (Although at first I was against this change when Asmus proposed it, as I have upgraded my tools for supplementary code points, I have come to really see the value of it.) I was discussing that with Markus Scherer today, and he mentioned that it would also be much cleaner to apply this approach to the main file, UnicodeData.txt. (currently http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0d5.beta.txt). We have quite a number of ranges, with a very clumsy mechanism for indicating those ranges. Example: 3400;;Lo;0;L;;;;;N;;;;; 4DB5;;Lo;0;L;;;;;N;;;;; Parsers would be much cleaner and simpler if they could handle all ranges in all the Unicode data files the same. So, the proposal is to change ranges like the above into: 3400..4DB5;;Lo;0;L;;;;;N;;;;; Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], mark.davis@us.ibm.com, president@unicode.org http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014