fdc>The final objective is to cover all human languages and
writing systems in the UCS. But it seems each one needs a
trial run in a simpler, self-contained character set. Perhaps,
then, it is not practical to avoid creating additional
single-byte sets. So...
fdc>Let's hope that these new 8-bit character sets can be
published and shared so the same work does not need to be
needlessly replicated, and to facilitate a review process that
might ensure a better result for the final UCS versions.
From this, I understand that you would like to see various
8-bit character sets developed by SIL published in some form.
I'm not sure if you're also wanting to see something in the way
of developing standards around such character sets or of
inclusion in some registry.
I need to warn that there are literally many hundreds of these
that have been developed in the 4+ decades in which SIL has
been working with language data electronically. (While working
on my MA thesis on a Mayan language in Mexico, I was fortunate
to be able to get a sizeable electronic corpus of text from a
closely related language. It came from the data fed into a
typesetter for a publication done in 1955, was on paper tape,
and was 7-bit; I don't recall to what extent escape sequences
were involved, but I'm pretty sure there were some. There's
*lots* more of this kind of stuff in our archives, though.)
Also, it may not all be considered pretty by current standards.
Often these have been developed by linguists who may not have
had as much knowledge and skill in IT as in linguistics.
Generally, all of these were created to get a job done, but
those jobs may have been focussing on a particular process and
using proprietary systems. (In 1955, was there anything that
wasn't proprietary?) In some cases, though, there have been
alternate or competing encodings for a single language -
alternates may have been developed for different purposes, and
different researchers may have developed different encodings
where a single encoding would have sufficed merely by
historical accident.
I've been thinking for a while that there may be some value in
my starting to collect info on SIL-developed encodings, but it
would be an enourmous undertaking, and in an organisation where
the work far exceeds the available personnel, I can't say for
sure how successful I'd be. If I have time, though, I may try
to make a start anyway. If I do, I can certainly discuss making
that info available to others who are interested.
fdc>Let's hope that they are not designed around some
particular proprietary architecture and that some consideration
has been given to interchange, so that users of these writing
systems have some choice about platforms (this might mean, for
example, following the guidelines of ISO 4873).
I'm still climbing the learning curve on all these IT
standards, and this is one I don't recall encountering yet. Can
you give a brief explanation of what 4873 is all about?
fdc>And let's hope we can avoid the term "legacy" in this
connection. There's nothing legacy about it. It's
groundbreaking work. It's nothing to be ashamed of.
I can accept that. I guess I use legacy from a feeling that I'd
rather give up 8-bit encodings for good, as I mentioned
earlier. But you're right, there's a lot that's still
ground-breaking and not a cause for shame.
fdc>I suspect that Michael's or SIL's website might be a good
place from which to coordinate this activity, to whatever
extent this is not being done already.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT