Re: SGML entities for Unicode characters

From: Timothy Partridge (timpart@perdix.demon.co.uk)
Date: Wed Jul 23 1997 - 15:12:00 EDT


In message <9707221939.AA14937@unicode.org> you recently said:

> Oliver Christ wrote:
>
> > I'm looking for mapping tables from the entity names defined in the
> > SGML ISO public entity sets to Unicode code points, i.e. something
> > like ISOlat1.{sgm,ent} with
> >
> > <!ENTITY Eacute CDATA "&#x00C9;" -- capital E, acute accent -->

John Cowan wrote:
> I know of no complete and definitive list. A partial list is
> available at http://www.w3.org/TR/WD-html40-970708/sgml/entities.html
> listing the entity names supported by draft HTML 4.0.
> This includes
...
>
> Unicode whitespace and control characters (&emsp; &zwnj;
> &rlm; etc.)

These aren't on the Unicode 1.0 list. But the 1.0 list has many
more characters than the HTML 4.0 (mostly for Latin letters with
accents).

Many of the ligatures like filig now have codings at U+FB01 onwards.
The 1.0 list also doesn't have the 2.0 names which are slightly
different for many accented characters. I also noticed on the
"no equivalent" list sfrown and ssmile. There are Unicode characters
with these names and some smily faces. Does anyone know what the SGML
characters looks like?

Unicode 2.0 introduced extra Latin characters with accents - some of
these probably have SGML equivalents. (Is it possible / standard to
coin new letter accent combinations?)

   Tim

P.S. Since writing the above I have done a bit of digging around
Take a look at
http://www.sil.org/sgml/sgml.html
The section SGML (ISO 8879) Special Topics, subsection
SGML Character Sets / Multilingual Text contains a host of pages
about SGML character names. Anyone want to try mapping them to
Unicode?

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT