From: African Oracle (oracle@africaservice.com)
Date: Mon May 03 2004 - 11:41:15 CDT
Thanks Doug. all contributions are appreciated.
Regards
Dele
----- Original Message -----
From: "Doug Ewell" <dewell@adelphia.net>
To: "Unicode Mailing List" <unicode@unicode.org>
Cc: "African Oracle" <oracle@africaservice.com>; "Michael Everson"
<everson@evertype.com>
Sent: Monday, May 03, 2004 5:11 PM
Subject: Re: Nice to join this forum....
> Dele a.k.a. "African Oracle" <oracle at africaservice dot com> wrote:
>
> > GB is a different from G+B You do not pronunce the letters separately
> > but people that do not know anything about the language do which is
> > wrong. It is about correction and proper representation.
>
> What Michael and others have been trying to say is this:
>
> Unicode encodes characters, not languages. The word "character" means
> different things to ordinary people, depending on what language they
> speak and what script they write. "Characters" in Unicode do not always
> correspond 1-to-1 with "letters" in a given language's alphabet.
>
> Here are some quick and dirty definitions for our purposes:
>
> Character: the basic unit of text encoding.
> Letter: the basic unit of a language's orthography. Not necessarily the
> same as "character."
> Glyph: the visual representation of a character. Also not necessarily
> the same as "character."
>
> In Spanish, the combination "ch" is considered a distinct letter of the
> alphabet. It has its own name, "che." Children learn it as a letter
> that comes between "c" and "d". This is all good, but when it comes to
> representing text in computers, there is no separate "ch" letter in any
> of the encodings that people have used for decades. Spanish text
> includes the two characters "c" and "h". This has been true for
> decades, and it is also true when using Unicode.
>
> Likewise, in Yoruba, if there is no visual distinction between (1) the
> letter "GB" and (2) the two letters "G" and "B" that happen to appear
> together, as in your example, then the letter "GB" is encoded with the
> two characters "G" and "B". This does not deny the existence of a
> letter "GB" in the Yoruba language, it just dictates how that letter is
> encoded in computerized text.
>
> Now if you need to perform some other type of text processing, such as
> searching or sorting or spell-checking or line-breaking, then your
> software may need to understand the difference between the letter "GB"
> and the two letters "G" + "B". But this needs to be handled by the
> software, not the character encoding mechanism.
>
> > Here are few Yoruba alphabets which might not be new to you, so how
> > can you equate G+B with GB even if you claimed it has significant. How
> > significant is significant?
> >
> > A B D E E F G GB....
>
> Actually there are quite a few people on this list who are familiar with
> the letters of the Yoruba alphabet, and they are also familiar with the
> encoding principles of Unicode. That is why they are saying, yes, we
> know "GB" is a letter in Yoruba, but it is encoded as U+0047 "G" +
> U+0042 "B".
>
> -Doug Ewell
> Fullerton, California
> http://users.adelphia.net/~dewell/
>
>
>
This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT