Normalization Form KC considered harmful

From: John Cowan (cowan@locke.ccil.org)
Date: Wed Aug 18 1999 - 16:40:41 EDT


I note that based on the current beta 3.0 mappings, text using
only the ASCII subset of UTF-8 is not in Normalization Form KC.

The ASCII characters 5E CIRCUMFLEX ACCENT, 5F LOW LINE, 60 GRAVE ACCENT
are all compatibility characters, and Normalization Form KC will
change them into sequences of SPACE plus a combining character.
This is intolerable for Linux or anyone else who depends on
ASCII-compatibility.

Either these compatibility decompositions should be removed,
or Normalization Form KC should be avoided.

-- 
	John Cowan	http://www.ccil.org/~cowan	cowan@ccil.org
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
			-- Coleridge / Politzer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT