The ReadMe file for version 2.1.8 boldly states:
Note that as of the 2.1.8 update of the Unicode Character Database,
the decompositions in the UnicodeData.txt file can be used to recursively
derive the full decomposition in canonical order, without the need
to separately apply canonical reordering.
I've just found a bunch of Vietnamese characters for which this doesn't
seem to be the case, eg:
1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
== 00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0323 COMBINING DOT BELOW
== 0041 LATIN CAPITAL LETTER A
0302 COMBINING CIRCUMFLEX ACCENT
0323 COMBINING DOT BELOW
But the canonical order is, of course:
0041 LATIN CAPITAL LETTER A
0323 COMBINING DOT BELOW
0302 COMBINING CIRCUMFLEX ACCENT
This affects characters 1EAC,1EAD,1EB6,1EB7,1EC6,1EC7,1ED8,1ED9.
Would it be worthwhile me knocking up an algorithmic check that this
assertion doesn't fail elsewhere, or is someone else already looking at it?
-- Kevin Bracey, Senior Software Engineer Acorn Computers Ltd Tel: +44 (0) 1223 725228 Acorn House, 645 Newmarket Road Fax: +44 (0) 1223 725328 Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT