From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Tue Apr 15 2003 - 04:01:09 EDT
I posted the following to the DigitalTV group at the http://www.cenelec.org
webspace last Saturday, 12 April 2003.
I find it fascinating to consider to what extent this system could be used
for broadcasting information which can be easily and promptly translated
into the language of the end user and displayed upon the screen of his or
her DVB-MHP interactive television.
It is clear that some simple sentences can be used in this manner. Yet how
far does that capability go? Can the system be used for e-commerce? I am
finding that once the system is considered as a mathematical structure, then
the nature of language and how it can be represented using mathematics
arises. This is fascinating. By only using preset sentences this system
avoids many of the problems of automated language translation, such as those
of parsing an input sentence which is to be translated.
The abbreviation DVB-MHP is for Digital Video Broadcasting - Multimedia Home
Platform. Details of the system can be found at the http://www.mhp.org
webspace. The DVB-MHP system uses Java programs and Java uses Unicode.
William Overington
15 April 2003
---- Some possibilities regarding using many languages. In 2002 I carried out some initial research into the possibility of using a special encoding of preset sentences so as to facilitate the sending of messages in a format which could be easily translated into a large number of other languages. I am now considering applying the experience gained in that initial research to producing a system specifically intended only for use upon a DVB-MHP channel under the carefully controlled conditions which are possible with a broadcast channel. This research used an adaptation, to a different application domain, using computerized methods, of a system for sending messages which was widely used on railway systems in the past using telegraph systems. The following is an interesting web based documentation of a coding used with a telegraph system. http://www.railpage.org.au/telecode/ Please consider the following. http://www.railpage.org.au/telecode/tc05.gif.html Some messages are complete in themselves. Some of the messages can be customized with one or more parameters, depending upon the particular message. A parameter could be the name of place, a time or the index number of a locomotive. The possibility arises of using such a system to convey messages to end users upon a DVB-MHP channel which is broadcast to a number of countries, where a variety of languages are spoken, by having a collection of preset messages, some of which may be customized using one or more parameters, which may be processed in the DVB-MHP television set of an end user and then displayed in the local language. The DVB-MHP television set would gather from the object carousel of the DVB-MHP channel the database files necessary to perform the translation from the transmitted codes into the natural language (for example, Finnish, German, Estonian) of the viewer of the television display. The end user would have simply had to run an introductory program which asked for a choice of display language to be selected. For a carefully chosen collection of sentences the usefulness of such a system could be enormous within the European Union. The selection available would need to be large enough to allow application for activities such as almost real-time encoding and translation of weather information and weather forecasts, road traffic information, some distance education applications and so on. The selection would need to be sufficiently small so that having all of the sentences translated once into many languages would be a realistic possibility and that being able to have the database files broadcast upon a DVB-MHP channel would be possible, and that a European Union interactive television could handle the processing. The capabilities of the telesoftware system to treat the object carousel as a read-only disc drive in the sky and only store part of the database in the end user television at any one time could be useful in allowing specialist sentence collections to be used easily, yet that would need to be balanced against the speed requirement for producing the display, depending upon whether the translation needed to be fairly real-time or whether some delay would be acceptable, though that balance might vary greatly as between one particular application and another particular application. My initial research is available on the web. It will hopefully provide some idea of the possibilities. It is called the comet circumflex system. However, that system is initial research and has provided valuable experience upon which to build a system specifically intended only for use upon a DVB-MHP channel under the carefully controlled conditions which are possible with a broadcast channel, which system is somewhat different and more advanced. http://www.users.globalnet.co.uk/~ngo/c_c00000.htm On the following web page. http://www.unicode.org/charts/ There is a link to enable downloading of the following file. http://www.unicode.org/charts/PDF/U100000.pdf It is about Supplementary Private Use Area-B of the Unicode code space. This is for the 65534 characters in the range U+100000 through to U+10FFFD. I am now starting to design a system where each whole preset sentence is represented by one character code from the range U+100000 through to U+10FFFD. Thus, for example, a sentence such as "It is snowing." would be encoded using one character code. A sentence such as "It is snowing in Mainz." would have one character code for the sentence part "It is snowing in" and a method of encoding the name of the City of Mainz as a parameter. A name such as Mainz could be a literal name, as it would not be translated. A name such as Rome or Florence would be a character code from a range of the U+100000 through to U+10FFFD code space assigned for a list of major cities which are translated into local languages. A two parameter sentence, used for a sentence such as "The temperature in Mainz is 21 degrees Celsius." would have one code point for the main structure of the sentence and the Java program in the DVB-MHP television would use whatever value it had in the locality register of its language system engine and whatever value it had in the numerical data register 1 of its language system engine to produce the text stream which is to be displayed upon the screen for the end user. The language system engine being a software construct within a Java program, the Java program having been broadcast. The system would appear to the end user as just providing detailed, up-to-date weather information in his or her own language. As there are many languages in use throughout the European Union, such a system might be very useful for fairly quick translation of fairly predictable types of information, such as the sentences required to produce a weather forecast. William Overington 12 April 2003
This archive was generated by hypermail 2.1.5 : Tue Apr 15 2003 - 05:20:07 EDT