From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Wed Sep 13 2006 - 03:38:52 CDT
On Wed, 13 Sep 2006, Jose wrote:
> Unicode Technical Report #20 (Unicode in XML and other Markup
> Languages) http://www.Unicode.org/Unicode/reports/tr20/ specifies that
> Zero-width Joiners/ nonjoiners (ZWJ and ZWNJ) are suitable for use with
> in the markup.
Yes, for affecting ligature and joining behavior. I mention this because
there is a popular word processor that uses ZWJ and ZWNJ quite
inappropriately for line break control.
Of course, the statement is of general nature: those characters are in
principle suitable for use in marked-up text. It does not guarantee or
prescribe that a particular markup system allows them or that they will be
interpreted by their Unicode semantics.
> But when an xml file with the tags written in Malayalam
> using ZWJs (In Malayalam ZWJ is used to form certain characters) an
> error is reported that the tag contained an invalid character.
Reported by which program? I first suspected that you may have tried to
enter these characters but they do not appear correctly in the declared or
implied character encoding.
But reading again, I notice that you are referring to _tags_ and might
actually mean the use of characters in element or attribute names, as
opposite to their use in content between tags. UTR #20 discusses the
latter, i.e. what you can use in document content proper - together with
markup, not _inside_ markup (tags).
The use of characters in element and attribute names is governed by the
use of each markup language, basically in the _identifier_ syntax.
Generally, and in XML 1.0, control characters are excluded in that syntax,
and ZWJ and ZWNJ are control characters by definition (General Category:
Cf). Thus, an attempt to use them in element names would violate
well-formedness constraints, and an XML parser would report an error - not
about an invalid character per se but about a syntax error.
In XML 1.1, ZWJ and ZWNJ are allowed in identifiers, but this is probably
of little practical value.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Wed Sep 13 2006 - 03:42:36 CDT