Michael Everson wrote on 1999-04-06 12:10 UTC:
> I got one answer, what to paste in the header. The other question, what
> CAPITAL LETTER D WITH DOT ABOVE is, remains unanswered....
Michael,
Is it essential that you use UTF-8?
Without UTF-8 and any special headers, you can always specify in HTML
these characters via a decimal numeric character reference.
The following tiny table contains for all Unicode characters of the form
"LATIN * WITH DOT ABOVE" the decimal entity reference, the hexadecimal
entity reference, and the UTF-8 character:
Ċ    Ċ   Ċ  LATIN CAPITAL LETTER C WITH DOT ABOVE
ċ    ċ   ċ  LATIN SMALL LETTER C WITH DOT ABOVE
Ė    Ė   Ė  LATIN CAPITAL LETTER E WITH DOT ABOVE
ė    ė   ė  LATIN SMALL LETTER E WITH DOT ABOVE
Ġ    Ġ   Ġ  LATIN CAPITAL LETTER G WITH DOT ABOVE
ġ    ġ   ġ  LATIN SMALL LETTER G WITH DOT ABOVE
İ    İ   İ  LATIN CAPITAL LETTER I WITH DOT ABOVE
Ż    Ż   Ż  LATIN CAPITAL LETTER Z WITH DOT ABOVE
ż    ż   ż  LATIN SMALL LETTER Z WITH DOT ABOVE
Ḃ   Ḃ  Ḃ  LATIN CAPITAL LETTER B WITH DOT ABOVE
ḃ   ḃ  ḃ  LATIN SMALL LETTER B WITH DOT ABOVE
Ḋ   Ḋ  Ḋ  LATIN CAPITAL LETTER D WITH DOT ABOVE
ḋ   ḋ  ḋ  LATIN SMALL LETTER D WITH DOT ABOVE
Ḟ   Ḟ  Ḟ  LATIN CAPITAL LETTER F WITH DOT ABOVE
ḟ   ḟ  ḟ  LATIN SMALL LETTER F WITH DOT ABOVE
Ḣ   Ḣ  Ḣ  LATIN CAPITAL LETTER H WITH DOT ABOVE
ḣ   ḣ  ḣ  LATIN SMALL LETTER H WITH DOT ABOVE
Ṁ   Ṁ  Ṁ  LATIN CAPITAL LETTER M WITH DOT ABOVE
ṁ   ṁ  ṁ  LATIN SMALL LETTER M WITH DOT ABOVE
Ṅ   Ṅ  Ṅ  LATIN CAPITAL LETTER N WITH DOT ABOVE
ṅ   ṅ  ṅ  LATIN SMALL LETTER N WITH DOT ABOVE
Ṗ   Ṗ  Ṗ  LATIN CAPITAL LETTER P WITH DOT ABOVE
ṗ   ṗ  ṗ  LATIN SMALL LETTER P WITH DOT ABOVE
Ṙ   Ṙ  Ṙ  LATIN CAPITAL LETTER R WITH DOT ABOVE
ṙ   ṙ  ṙ  LATIN SMALL LETTER R WITH DOT ABOVE
Ṡ   Ṡ  Ṡ  LATIN CAPITAL LETTER S WITH DOT ABOVE
ṡ   ṡ  ṡ  LATIN SMALL LETTER S WITH DOT ABOVE
Ṫ   Ṫ  Ṫ  LATIN CAPITAL LETTER T WITH DOT ABOVE
ṫ   ṫ  ṫ  LATIN SMALL LETTER T WITH DOT ABOVE
Ẇ   Ẇ  Ẇ  LATIN CAPITAL LETTER W WITH DOT ABOVE
ẇ   ẇ  ẇ  LATIN SMALL LETTER W WITH DOT ABOVE
Ẋ   Ẋ  Ẋ  LATIN CAPITAL LETTER X WITH DOT ABOVE
ẋ   ẋ  ẋ  LATIN SMALL LETTER X WITH DOT ABOVE
Ẏ   Ẏ  Ẏ  LATIN CAPITAL LETTER Y WITH DOT ABOVE
ẏ   ẏ  ẏ  LATIN SMALL LETTER Y WITH DOT ABOVE
ẛ   ẛ  ẛ  LATIN SMALL LETTER LONG S WITH DOT ABOVE
You can try to cut&paste the characters from this table in any of the
three forms into your raw HTML document with any 8-bit plain text
editor.
I can easily dump to you the entire Unicode table in such a form if this
is of any help.
I've just spent the last 3 minutes writing the following tiny Perl
program that produced this table. Perl is extremely useful for
transforming the Unicode database into anything in a few minutes.
#!/usr/bin/perl
# subroutine to convert an integer into a UTF-8 string
sub utf8 ($) {
    my $c = shift(@_);
    if ($c < 0x80) {
        return sprintf("%c", $c);
    } elsif ($c < 0x800) {
        return sprintf("%c%c", 0xc0 | ($c >> 6), 0x80 | ($c & 0x3f));
    } elsif ($c < 0x10000) {
        return sprintf("%c%c%c",
                       0xe0 | ($c >> 12),
                       0x80 | (($c >> 6) & 0x3f),
                       0x80 | ($c & 0x3f));
    } else {
        return utf8(0xfffd);
    }
}
# read list of all Unicode names (UnicodeData-Latest.txt) and
# output a list with NCRs (dec and hex) as well as UTF-8 and the name
while (<>) {
    if (/^([0-9,A-F]{4});([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*)$/) {
        next if ($2 eq "<control>");
        $ncr_dec = sprintf("&#%d;", hex($1));
        $ncr_hex = sprintf("&#x%x;", hex($1));
        printf("%s%s%s  %s\n",
               $ncr_dec . (" " x (10-length($ncr_dec))),
               $ncr_hex . (" " x (10-length($ncr_hex))),
               utf8(hex($1)), $2);
    } else {
        die("Syntax error in line '$_' in file '$unicodedata'");
    }
}
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT