From: Hart, Edwin F. To: 'Kenneth Whistler' Cc: Subject: Guidelines for deciding what to code Sent: 10/9/00 11:38 AM Importance: Normal Ken, Welcome back! You submitted a very nice summary of the meeting. I hope that you and Sonya had a pleasant vacation in Greece following the meetings. I had lunch with Sato-San and we discussed his concerns with the character-glyph model. I'd like to run some thoughts by you. His main concern appeared to be to have a document that he could hand to linguists to educate them and to help guide them in selecting what to encode for minority Southern Asian and Southeastern Asian scripts that have not yet been computerized. I'm unsure if he wanted this for guidance or for clout. He also appears to need to deliver such a document as one of his tasks for his job early next year. I am unaware of such a document. Since you are a linguist and somewhat familiar with the coding issues, I thought that you might be able to help clarify my understanding of some of the concerns. Here are some of my notes from the conversation. The character-glyph model describes two separate domains, a character domain and a glyph domain and the process to render characters into glyphs for presentation. The Technical Report uses the following diagram describe the model: Character domain ? Glyph Selection/Rendering Process ? Glyph domain Sato-San wants to augment the concepts in the character-glyph model to (1) include input methods (processes for converting keystrokes into a stream of character codes) and (2) guidelines for coding the writing system elements. While he did not necessarily want to revise the Technical Report to include this material, he really wanted an authoritative reference document with coding guidelines that he could use in his efforts with language experts who had no knowledge of computers and coding. He thought that input was separate process and that deciding what should be coded should not depend on the input process. He wanted to define a complete set of functions that a generalized input method would need to handle all writing systems. His concerns were: 1. In some languages, the display order of characters and the phonetic order of characters are different. How should the characters be ordered in character strings, display order or phonetic order? I do not recall this question being raised before. Also, how should they be entered? He answered his own question. The input method needs to be able to handle character entry by both display order and phonetic order for the same language because people use both methods. 2. Some languages have writing elements where one of them is a doubling of another element. (In the Latin script, you can think of a “w” as a pair of “v” letters or an “m” as a pair of “n” letters. In some writing systems, a person normally enters the equivalent of a “w” as a pair of “v” elements.) Should the “w” be coded as a pair of “v” elements or a separate element? What happens if the person enters “vvv” in the middle of a word? How does software decide which 2 of the 3 should be paired (assuming “vvv” does not occur)? Should it be “wv” or “vw” when either may be valid? Sato-San gave an example in Hangul syllables, but there the consonants with the "double" glyph have a separate code than the ones with a singly glyph, so this example may provide one answer to the ambiguity of "vvv" or "nnn". I just not sure if we should generalize this into a principle. As a first thought, the following diagram may form the basis for understanding the additions he is requesting. He appears to be asking to expand the model in the input (left) side (a) to decide what to code and how to code, and (b) to decide the general processes that would be needed in a generalized input method. Coding Guidelines ? Character Code ? Input Process ? Character domain ? Glyph Selection/Rendering Process ? Glyph domain Thanks for your thoughts, Ed Edwin F. Hart edwin.hart@jhuapl.edu The Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 USA +1-443-778-6926 (Baltimore area) +1-240-228-6926 (Washington, DC area) +1-443-778-1093 (fax) +1-240-228-1093 (fax)