Re: Latin w/ diacritics (was Re: benefits of unicode)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Fri Apr 20 2001 - 03:13:57 EDT


In the early 1990s I did a small piece of research on devising a method of
inputting text in the Esperanto language into a PC using an ordinary English
keyboard.

Some aspects of that research now appear to be relevant to the present
discussion of implementing unicode 3.1 on older computer systems.

Esperanto uses most of the English alphabet plus twelve extra characters,
namely six uppercase and six lowercase. These are c circumflex, g
circumflex, h circumflex, j circumflex, s circumflex, u breve. The u breve
is, for purposes of identification within this discussion only, the
character that is a letter u with above it what looks like like a small open
round bracket rotated anticlockwise through ninety degrees.

The research was interesting and I later used the experience gained in part
of my creative writing hobby. The result is on our family webspace in
England in the document www.users.globalnet.co.uk/~ngo/euto0008.htm if
anyone is interested, yet reading it is not necessary to understanding this
posting. The part based on the research starts about a quarter of the way
into the document.

The program that I wrote for the research was in Pascal, using Borland Turbo
Pascal 4, or maybe 5. I know that Borland later reached version 6. I have
not programmed in Pascal for a long time so maybe Turbo Pascal is still
going strong with a later version. One could compile the program and
produce a stand alone executable program to run on a PC.

The PC that I used was a 386 based machine with a 40 Megabyte hard disc
running Windows. The Pascal was the DOS Pascal, not the Pascal for Windows,
that is, the programs produced were for running under DOS. I cannot
remember whether the Pascal Integrated Development Environment itself was a
Windows program. The Windows machine allowed a DOS program to be started
from a motif on the Window of the PC.

I did not have to get into the operating system or anything like that, I did
not have to put anything on the hard disc. My program was on a 3 1/4 inch
floppy disc which I removed from the machine after a session. The Pascal
Integrated Development Environment was on the hard disc.

I do not remember the precise details of how I did the character drawing,
but it was something along the lines of using VGA graphics and a putpixel
command, of which I have possibly forgotten to put either one or two
uppercase P characters in putpixel, together with a file called
fontfile.txt or something similar which consisted of about 300 times 23
lines of text. Of each 23 lines of text, the first line would be an integer
character number and the next 22 lines would each consist of 15 characters,
each either a * or something else. At the start up of the program the
program would read in the contents of fontfile.txt and, for the * and other
lines, convert the * to a one and anything else to a zero and an integer
would be coded up and stored in the program. When a character was needed
the 22 coded integers would be decoded and some putpixeling would take
place. Green letters on a black background. The use of the "anything else"
gave the facility that the blank templates for the characters need not be
all spaces or all full stops but could have other characters used so as to
be a guide for manually editing the fontfile.txt file in devising the bit
patterns for all of the printing characters. I used character codes just
above 256 for the Esperanto characters, though these were not just
arbitrary. It is quite likely that they were the codes used by Unicode
today as some kind person provided them to me from the ISO standard of the
time when I enquired in the soc.culture.esperanto newsgroup.

I do not know whether the concept of VGA graphics still exists in PCs today
or whether Turbo Pascal is still going, but it appears to me that a stand
alone executable for a PC that performs wordprocessing using VGA graphics
and does not seek to use the fonts of the operating system or even the drawn
fonts of the Borland Turbo Pascal graphics system would be a feasible idea.

The question arises of portability of any text files produced should the
user manage to later obtain usage of a more modern computer. I feel that if
there were some underlying standards for this type of work then maybe a
great deal could be achieved. So I am here putting forward some first
suggestions. I am happy for these to get modified as part of the discussion
process. I may well learn lots of things from such a discussion.

1. An application program has in the same directory as itself a file named
fontfold.txt being a one line ascii text file with a return character at the
end of the line of text. The file fontfold.txt contains either the full
path of the directory on the hard disc where the font files (that is, the
font files for this type of work) are stored or else a full stop character
to indicate that the font files are in the same directory as the application
program.

2. The directory containing the font files has a file named fontlist.txt
being a text file in ascii text which consists of as many lines as one
chooses, each line having a return character at the end. Each such line is
of the format of a hexadecimal number of 1 to 6 digits, followed by a space,
followed by another hexadecimal number of 1 to 6 digits, followed by a space
followed by one ascii character (hereinafter termed the selection
character), followed by a space followed by a file name in the format of
from one to eight characters suitable for a file name followed by a full
stop and three more such characters. The second hexadecimal number on a
line is greater than or equal to the first hexadecimal number on that line.
The meaning is that the named file contains the bit patterns for the
characters in that unicode code number range for that selection character.

Selection characters are R for roman, I for Italic, B for bold, J for bold
italic. (Others can be added as development of this standard proceeds). R
is the default sought by an application.

3. The format for the font files needs to be defined. I suggest a text
format that can be edited using a text editor so that there is
opportunity for everyone to be able to conveniently define font files. In
practice hopefully there would be various standard font files that one could
use yet having a text file format that everyone can edit using a text editor
would mean that there would be the possibility for users to add an extra
file for an extra character on their own initiative.

Maybe the files for an R selection character would be a simpler format than
those for an I selection character so that font files for straightforward
roman character display into a rectangular cell on the screen could be
designed without needing to get involved in the intricacies of displaying
in italic in a parallelogram shaped cell on the screen.

4. Text files of unicode could be stored using the Pascal file of byte
format with both a two bytes for each character format and a three bytes for
each character format being specified. A utility program would allow
conversion from one file type to another. Some of the file formats may
already exist as part of the unicode standard. It would be great to use
those formats as that would enable any files produced to be used on a modern
system and also allow users of the older systems to be able to read files
written on modern systems.

It would appear to me that programs written as stand alone executables for a
PC using such a standardization document would allow many PCs fitted with
Windows 95 and Windows 98 and earlier Windows systems and even DOS only
systems to be able to utilize unicode 3.1 with the capability that if the
users later got to use modern equipment then any text files produced would
be fully useable.

Also, the standardization system could possibly be used on other types of
computers as well.

The system above might, in addition, be useful for people who do have the
latest computing equipment as it does allow for the possibility of using
straightforwardly produced custom fonts for private use area characters.

This idea is today just an idea, yet with a good discussion and lots of
enthusiasm it may be possible to produce standards and software and font
files that will allow unicode 3.1 to become much more widely used more
quickly than would have otherwise been the case.

Please remember that many of the machines that cannot use unicode 3.1 at
present are only a very few years old. I feel that it would be a most
regretable situation if unicode 3.1 only becomes widely available once most
people are using computers produced in or after the year 2000 or
thereabouts. That may well take many years. I am fortunate to be able to
use two PCs at times. One runs at ten times the speed of the other. For
some applications, such as fast graphics, that speed difference is critical.
For other applications, such as typing this document, that speed difference
is unimportant.

William Overington

20 April 2001

www.users.globalnet.co.uk/~ngo



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT