Re: Specification of Encoding of Plain Text

From: Mark Davis ☕️ <mark_at_macchiato.com>
Date: Sat, 14 Jan 2017 11:56:10 +0100

Mark

On Fri, Jan 13, 2017 at 7:19 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> On Fri, 13 Jan 2017 10:38:30 +0100
> Mark Davis ☕️ <mark_at_macchiato.com> wrote:
>
> > On Thu, Jan 12, 2017 at 10:26 PM, Richard Wordingham <
> > richard.wordingham_at_ntlworld.com> wrote:
>
> > > Using Script_Extensions to document the international
> > > combining characters that are used, for example, with Thai bases
> > > could have all sorts of undesirable knock-on effects.
>
> > If you know of combining marks whose scx values should include Thai,
> > please let us know.
>
> If you refer to the end of TUS 9.0 Section 16.1 you will find mention
> of U+0331 COMBINING MACRON BELOW and U+0303 COMBINING TILDE, which are
> thus candidates for scx ∍ Latn. One might also consider U+0359
> COMBINING ASTERISK BELOW; I have seen the combination ช͙ <U+0E0A THAI
> CHARACTER CHO CHANG, U+0359> used in a phonetic symbol for English,
> representing [ʒ].
>
> As their scx values are 'Inherited', should their values not be treated
> as though they already included Thai? I suppose, though, that they
> do not in fact match "p(scx=Thai)". There does seem to be a view that
> scx=inherited is shorthand for some list of European scripts.
>

​The distinction between sc=inherited and sc=common is an unfortunate one,
a remnant from when we first added the script data. The distinction for a
character C is purely derivable from whether gc(C) ∈ [[:mn:][:me:]] or not,
so it is of little value — and with the advantage of hindsight, mostly just
gets in the way.

scx=inherited is *not* a shorthand for some list of European scripts.

Rather, C ∈ [
[:
scx=inherited:]
​[:
scx=inherited:]
​]​
 means that either

   1. we don't have enough information about usage to be able to list the
   scripts that C is used with, or
   2. C can be used with so many scripts that it is not particularly
   productive to list them all.

> Richard.
>
>
Received on Sat Jan 14 2017 - 04:57:03 CST

This archive was generated by hypermail 2.2.0 : Sat Jan 14 2017 - 04:57:05 CST