From: Doug Ewell (dewell@adelphia.net)
Date: Fri Aug 25 2006 - 00:49:38 CDT
Oliver Block <lists at block dash online dot eu> wrote:
> definition C12a of Unicode Standard Version 4.0 mentions so "mangled"
> text caused by folding (last paragraph of C12a).
>
> Having the definition in mind (italic text at the top of C12a) I
> understand mangled text as ill-formed text, that is not according to
> table 3-6. Would you agree/disagree?
It is ill-formed text of a special type: it would have been well-formed
if not for an easily recognized, external process or layer -- the
example mentions inserting a CR/LF pair every 80 bytes -- that can
easily and unequivocally be reversed.
Definition C12a states that a process may interpret such data, but goes
on to say, "However, such repair of mangled data is a special case, and
it must not be used in circumstances where it would cause securtiy
problems." I think it is clear that the intent of C12a is not to allow
a conformant process to interpret just any old random junk as if it were
well-formed UTF-8.
> Further, what about combining character sequences? Inserting a CRLF
> between a base character and a combining charcter or between one of
> the combining characters would not produce an ill-formed
> byte-sequence. Would you agree/disagree?
I would agree, but I have the feeling this was intended to be relevant
to the "mangled text" question above and I don't see the connection.
> (As every specification that requires folding does also require
> unfolding, this would probably be more a semantic issue.)
I do not agree that every specification that requires folding also
requires unfolding.
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Fri Aug 25 2006 - 01:00:02 CDT