Re: Question regarding HTML 4.0 &#x??; syntax

From: Misha Wolf (misha.wolf@reuters.com)
Date: Fri Jul 18 1997 - 14:13:28 EDT


David wrote:
>
> Martin Duerst wrote:
>
> > - Besides encoding characters in the "charset" of the document
> > (i.e. directly as bits and bytes), HTML has other ways
> > of encoding characters. For u-Umlaut, for example, you
> > can use a character entity (ü), a (decimal) numeric
> > character reference (ü), or soon in HTML 4.0 a
> > hexadecimal numeric character reference (ü). These
> > are part of the source and should be shown as such in
> > "view source".
>
> I assume &#x??, where ??=hex value, will display the character associated
> with the assigned charset with the ?? hex value.
>
> Do you know whether &#x????, where ????=hex value, will be accepted and
> display the character associated with the UNICODE ???? hex value regardless
> of charset?

All numeric character references (NCRs), whether decimal or hexadecimal, are
resolved in relation to the HTML "document character set", which is Unicode.
The actual "charset" used for a page has *no effect* on the resolution of NCRs.
Older browsers from the big two vendors used to get this wrong. No browser
that implements hexadecimal NCRs should get it wrong, as this is a brand new
feature in HTML 4.0 and is clearly defined to refer to the "document character
set" (ie Unicode).

Misha

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT