RE: SECS & VSECS: Small European Character Sets

From: Murray Sargent (murrays@microsoft.com)
Date: Fri Aug 14 1998 - 15:45:39 EDT


Aren't you asking that _fonts_ be made containing at least these characters?
I.e., no new character set is involved? If so, the implementation only
involves assembling the desired fonts (assuming the OS handles Unicode
fonts).

Thanks
Murray

> -----Original Message-----
> From: Markus Kuhn [SMTP:Markus.Kuhn@cl.cam.ac.uk]
> Sent: Friday, August 14, 1998 12:20 PM
> To: unicode@unicode.org; iso10646@listproc.hcf.jhu.edu
> Cc: kosta@live.robin.de; Primoz Peterlin; Kaleb S. KEITHLEY
> Subject: SECS & VSECS: Small European Character Sets
>
>
> Small European Character Sets
> -----------------------------
>
> I have recently spent quite some time working out a proposal for two
> Unicode/ISO 10646 subsets that are so small that I hope they will become
> widely implemented in Europe and America. Both are specifically designed
> to be suitable for systems where characters are represented in
> low-resolution fixed-width fonts. This includes for instance your xterm
> and Emacs window under Unix (or more general VT100 emulators and source
> code editors), but also applications such as portable LCD devices
> (pager, mobile phones), where only a small subset of Unicode makes sense
> to be implemented and where no single 8-bit set can cover a reasonable
> number of languages. These subsets are not really intended for
> applications such as the publishing industry, where these display
> restrictions do not exist and larger Unicode subsets or even full
> implementations might be adequate.
>
> The two subsets are:
>
> - Very Simple European Character Set (VSECS)
> 345 characters, basically the superset of Latin 1-4,9,10,15 and CP1251
> plus a very few ISO 6397 characters
>
> Rows Positions (Cells)
> 00 20-7E A0-FF
> 01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
> 02 C6-C7 D8-DD
> 20 13-15 18-1A 1C-1E 20-22 26 30 39-3A AC
> 21 22 26 5B-5E 90-93
> 26 6A
> FF FD
>
> - Simple European Character Set (SECS)
> 683 characters, covers in addition to VSECS also Cyrillic, Greek,
> MS-DOS blockgraphics, and a moderate set of mathematical characters
> that is likely to be used in academic email and source code comments.
>
> Rows Positions (Cells)
> 00 20-7E A0-FF
> 01 00-13 16-2B 2E-31 34-3E 41-48 4A-4D 50-7E 92
> 02 BC-BD C6-C7 D8-DD
> 03 84-86 88-8A 8C 8E-A1 A3-CE D1 D5-D6 F1
> 04 01-0C 0E-4F 51-5C 5E-5F 90-91
> 20 13-15 17-1A 1C-1E 20-22 26 30 32-34 39-3A 70 7F-83 A7 AC
> 21 02 15-16 1A 1D 22 24 26 5B-5E 90-95 A4-A7 D0-D5
> 22 00-09 0B-0C 12-13 18-1A 1D-1E 24-2A 3C 43 45 48-49 58 5F-62 64-65
> 22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
> 23 00 08-0B 10 15 20-21 29-2A
> 25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 B2
> 25 BA BC C4 CB
> 26 10-12 3A-3C 40 42 6A-6B 6D-6F
> 27 13 17
> FF FD
>
> VSECS is somewhat similar to ISO 6937 with some bugs fixed (e.g., the
> Euro symbol is included, as are the directed quotation marks).
>
> SECS is somewhat similar to Microsoft/Adobe WGL4. I think SECS is much
> better than WGL4, because WGL4 contains many letters for which I could
> not find out where they are used (for at least three I am sure they
> never existed). SECS contains the following 91 characters that are not
> part of WGL4:
>
> Rows Positions (Cells)
> 02 BC-BD
> 03 D1 D5-D6 F1
> 20 34 70 80-83
> 21 02 15 1A 1D 24 A4-A7 D0-D5
> 22 00-01 03-05 07-09 0B-0C 13 18 1D 24-28 2A 3C 43 45 49 58 5F 62
> 22 6A-6B 82-8B 95 97 A4-A7 C2-C3 C5
> 23 00 08-0B 15 29-2A
> 26 10-12 6D-6F
> 27 13 17
> FF FD
>
> Almost all of these are a set of basic mathematic characters that most
> high school students should be familiar with. They are very useful to
> have available in academic email discussions and source code comments.
> It would be nice if the authors of WGL4 considered seriously to extend
> their Unicode subset by those few dozen elementary math symbols. Then
> SECS would become a subset of WGL4. VSECS is already a subset of WGL4
> except for U+FFFD.
>
> The mathematical symbols of SECS will hopefully provide for US
> developers who do not specialize in i18n issues some motivation to get
> interested in 16-bit character sets, as they are more relevant for their
> personal use than the accented characters of crazy Europeans.
>
> My dream is that something like SECS becomes rather soon the common
> minimum repertoire in Unix X11 fonts and printer fonts. VSECS is
> intended as an intermediate step for applications where the size of the
> character set is critical and only Latin script support is required.
>
> I do not think SECS contains any useless symbol. I know for each letter
> and symbol why it is in there and in which languages or fields it is
> used. Just ask.
>
> Much more information on the two sets is available from
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/vsecs.html
> http://www.cl.cam.ac.uk/~mgk25/ucs/secs.html
>
> Much better than just looking at these web pages is to download the
> database (Perl needed) that generated them from
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/secs.tar.gz
>
> Then you can play around with them and test the subset properties with
> regard to other sets easily yourself.
>
> If you want to see example glyphs on the HTML output of this script,
> then you'll also need
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/glyphs.zip
>
> The uniset Perl script allows you to comfortably build up your own
> database of character collections, to merge and subtract them and to
> generate Unicode subsets and study their relations with other subsets.
> The mapping files from the Unicode Consortium can be used directly as
> input.
>
> Please let me know what you think about SECS and VSECS and if this is
> something you would like to see widely implemented.
>
> Markus
>
> --
> Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
> email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT