Re: UnicodeData-2.1.8 bug report

From: Mark Davis (marked@best.com)
Date: Wed Mar 17 1999 - 11:39:15 EST


Thanks for your bug report. We had found these exceptions after 2.1.8 was out.
They will be in the 2.1.9 version (which we were just waiting on releasing
until the Unicode 3.0.0 data was final).

Mark

Kevin Bracey wrote:

> The ReadMe file for version 2.1.8 boldly states:
>
> Note that as of the 2.1.8 update of the Unicode Character Database,
> the decompositions in the UnicodeData.txt file can be used to recursively
> derive the full decomposition in canonical order, without the need
> to separately apply canonical reordering.
>
> I've just found a bunch of Vietnamese characters for which this doesn't
> seem to be the case, eg:
>
> 1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
>
> == 00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
> 0323 COMBINING DOT BELOW
>
> == 0041 LATIN CAPITAL LETTER A
> 0302 COMBINING CIRCUMFLEX ACCENT
> 0323 COMBINING DOT BELOW
>
> But the canonical order is, of course:
>
> 0041 LATIN CAPITAL LETTER A
> 0323 COMBINING DOT BELOW
> 0302 COMBINING CIRCUMFLEX ACCENT
>
> This affects characters 1EAC,1EAD,1EB6,1EB7,1EC6,1EC7,1ED8,1ED9.
>
> Would it be worthwhile me knocking up an algorithmic check that this
> assertion doesn't fail elsewhere, or is someone else already looking at it?
>
> --
> Kevin Bracey, Senior Software Engineer
> Acorn Computers Ltd Tel: +44 (0) 1223 725228
> Acorn House, 645 Newmarket Road Fax: +44 (0) 1223 725328
> Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/

--
business: medavis2@us.ibm.com, mark@unicode.org
personal: mark@macchiato.com, http://www.macchiato.com
--



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT