From: verdy_p (verdy_p@wanadoo.fr)
Date: Tue Aug 18 2009 - 14:39:30 CDT
"Andreas Stötzner" wrote:
> Last but not least, this is not a question of typographic *geekness*.
> It’s a systemic issue: are phonetic β, θ and χ the same characters as
> the Greek β, θ and χ?
> I think, beyond glyph shaping details, it all comes down to this simple
> question.
The question would not be complete without also asking your self if the phonetic d is the "same" as the Latin d.
By "same" it does not mean that they have equal semantics/meaning, given that they are in fact used in distinct
contexts (not really the same languages). From this discussion, it becomes now clear that IPA designers really
wanted that their symbols adopt a style that harmonizes very well with Romane letters. But they immediately created
exceptions, including restrictions about their "permitted" shapes in Latin where they may have ambiguities.
So yes, it seems that they created IPA with the intent of being a subset of the Latin script, even if this meant
that the script had to be extended to cover borderline cases.
The question of sorting IPA characters is independant. It does not matter if Greek and Latin symbols are sorted in
separate segments, given that the default sort order in DUCET has absolutely no meaning in phonetic terms: IPA
should already be ideally sorted in another order, already requiring tailoring for matching near phonetic
realizations of words that are unified in language-specific phonologies.
Once you start using collation tailoring, it absolutely does not matter if the symbols belong to distinct scripts
(given thay all other characters without any defined meaning in IPA will not have to be sorted, or will be sorted
completely separately, either as encoding errors/ambiguities to be corrected, or as text not related to IPA phonetic
or phonology.
So this is already a non-issue for collation : using new separate characters for IPA, or using variation selectors
will absolutely not prevent UCA collation tailoring to work correctly, including for phonologic cases (where lots of
realizations are possible and unified language per language).
Collation, anyway, is not a problem for ISO 10646, and collation tables are not direct components of The Unicode
Standard (only the UCA algorithm is standardized in both Unicode and a separate ISO standard, and the DUCET is
partly standardized by reference only, but not the direct subject to the Unicode stability rules). Collation tables
are localization issues, but they do not affect how texts must be effectively represented and encoded, or how they
can be given semantics and handled in variou transformation algorithms, or how they must be rendered.
In addition, even within the same language, collation orders are not unique and adapted for each usage. I don't see
why this would not be the case for IPA (between "pure" phonetic representations, or in extended academic notations,
or in language-specific phonologies including dictionnaries).
If you are still not convinced, you should see that there already exists tools built on top of Wiktionnary that
allow word searches to be performed phonetically or phonologically, including when searching for rimes, possibly in
multiple languages simultaneously. These tools work bery well, even if "incorrect" codes are used for some IPA
symbols, just because the phonetic or phonologic notations found in articles are easily detected by the templates with which they are inserted consistently in articles, and then collated in a distinct database for each language.
This effectively avoids creating (and painly maintaining manually with lots of corrections) a lot of specific pages
for rimes, or it can also be used to correct, almost automatically, most encoding errors (with the non-preferred
characters).
This tool made for Wiktionnary absolutely does not matter if you have used, for example, a regular Latin 'g' or the
IPA-specific and really 'geeky' IPA single-eyed 'g' which has no ambiguity... So this clearly demonstrates that the
need of disunification between Latin/Greek and IPA is not justified by practical reasons, at least not for searches.
Note that you may argue that Wiktionnary is not really plain-text (even if the tools that index it is effectively
loading and indexing articles created and maintained in plain-text only), given that template syntaxes are used and
detected to add extra semantics to these texts.
This archive was generated by hypermail 2.1.5 : Tue Aug 18 2009 - 14:41:35 CDT