Format A

From: Doug Ewell via Unicode <unicode_at_unicode.org>
Date: Thu, 30 May 2019 09:49:13 -0700

Apologies if this is a repeat of a (much) earlier inquiry.
 
The mapping tables that are available as part of the Unicode Standard
(http://www.unicode.org/Public/MAPPINGS/) are generally provided in a
text format called "Format A." Each line in the file defines a mapping
between a character in a legacy encoding and the Unicode equivalent,
with fields separated by tabs or sequences of spaces, like this:
 
0xA0 0x00A0 #NO-BREAK SPACE
0xA1 0x00A1 #INVERTED EXCLAMATION MARK
0xA2 0x00A2 #CENT SIGN
 
The format supports DBCS as well:
 
0x8140 0x4E02 #CJK UNIFIED IDEOGRAPH
0x8141 0x4E04 #CJK UNIFIED IDEOGRAPH
0x8142 0x4E05 #CJK UNIFIED IDEOGRAPH
 
My questions are:
 
1. Is there a specification for this format anywhere, and if so, where?
 
2. Is there a "Format B" or similar? (I don't mean UCM, CharMapML, RFC
1345 format, etc., but something truly similar to and/or derivative of
Format A.)
 
Please reply on-list only if you think the list at large would benefit
from your reply. I'm hoping some of the Unicode elders might have some
insight here.
 

--
Doug Ewell | Thornton, CO, US | ewellic.org
 
Received on Thu May 30 2019 - 11:50:17 CDT

This archive was generated by hypermail 2.2.0 : Thu May 30 2019 - 11:50:18 CDT