Re: character entities in UTF-8 files

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Jul 12 2005 - 19:36:14 CDT

Next message: Eric Muller: "Representing Armenian text"

Previous message: Gregg Reynolds: "Re: character entities in UTF-8 files"
In reply to: Gregg Reynolds: "Re: character entities in UTF-8 files"
Next in thread: Gregg Reynolds: "Re: character entities in UTF-8 files"
Reply: Gregg Reynolds: "Re: character entities in UTF-8 files"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 13/07/2005 00:52, Gregg Reynolds wrote:

> ... From the Unicode perspective, a sequence of characters like
> é is just a sequence of 5 distinct characters with no further
> semantics. Interpreted in accordance with XML, however, such a
> sequence *must* (not "may") be interpreted as e acute. Note that (if
> I'm not mistaken) such interpretation logically precedes other
> parsing. That is, an XML parser will first interpret (i.e.
> substitute) character *entities*, and then parse the resulting text.
> So what gets passed from the XML parser to higher-level processors is
> e acute, not the five character sequence é. ...

I don't think you can be quite right, at least unless XML is quite
different from HTML here. For surely in both HTML and XML character
entities like < can and should be used to replace the character "<"
when this is not to be interpreted as the start of a tag. This implies
that character entities are parsed not as the first stage of parsing,
but only after "<" is recognised as the start of a tag.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/
-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.8.12/46 - Release Date: 11/07/2005

Next message: Eric Muller: "Representing Armenian text"
Previous message: Gregg Reynolds: "Re: character entities in UTF-8 files"
In reply to: Gregg Reynolds: "Re: character entities in UTF-8 files"
Next in thread: Gregg Reynolds: "Re: character entities in UTF-8 files"
Reply: Gregg Reynolds: "Re: character entities in UTF-8 files"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jul 12 2005 - 20:27:36 CDT