From: Richard Ishida (ishida@w3.org)
Date: Mon Jan 21 2008 - 13:12:04 CST
The things you describe at the beginning of your list below are what I would call transcriptions, rather than transliterations. There is no need to represent the case of the source in those, I agree. But equally, for many scripts there is no reliable way to easily reconstruct the source script from something like IPA.
What I'm talking about is what I called transliteration, and defined as a method of converting text that allows you to recreate the original source from the target (ie. reversability). If you want to do that for a source script that is multicameral, you would need some way of capturing whether the source contained upper or lower case characters.*
This discussion is exactly why I wrote earlier that I think the Transliteration Guidelines document should be more careful in separating, describing and labeling these two different approaches.
RI
* You could of course use ʃ in a 'transliteration scheme' if you included additional information, such as, say, an up-arrow immediately afterwards to indicate when it should be converted to an upper case character.
============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
http://www.w3.org/International/
http://rishida.net/blog/
http://rishida.net/
> -----Original Message-----
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> Sent: 21 January 2008 18:43
> To: 'Richard Ishida'; 'Rick McGowan'; unicode@unicode.org
> Subject: RE: Unicode Transliteration Guidelines released
>
> Richard Ishida wrote:
> > Cautions
> >
> > Another thing to look out for when dealing with cased scripts is simply
> > that the characters in the target must always be capable of switching
> case
> > too - ie. many IPA symbols such as ʃ cannot be used since they cannot
> > represent case distinctions.
>
> Why that?
>
> The target must first support multicameral orthographies.
>
> * If the target is IPA, no such requirement is necessary.
>
> * Same thing for transliteration to X-SAMPA, despite it uses the basic
> Latin alphabet, but without case (lowercase and uppercase are used for
> distinct sounds).
>
> * Same thing for the transliteration to Hangul alphabet or Georgian (true
> most of the time with modern or old classic orthographies, but possibly
> false for classical religious texts), or Arabic, Hebrew, or syllabaries
> (Aboriginal Canadian, Cherokee, Japanese Kanas...), or the many Indic
> abugidas (including Tibetan).
>
> Multicameral scripts are the exception (even though they predominate in
> worldwide use), not the rule.
>
This archive was generated by hypermail 2.1.5 : Mon Jan 21 2008 - 13:10:42 CST