User-perceived character (was: "textels")
Janusz S. Bień
jsbien at mimuw.edu.pl
Mon Sep 19 01:23:53 CDT 2016
On Sun, Sep 18 2016 at 22:02 CEST, asmusf at ix.netcom.com writes:
> On 9/18/2016 3:26 AM, Janusz S. Bien wrote:
>> From the Unicode glossary:
>> Grapheme. (1) A minimally distinctive unit of writing in the context
>> of a particular writing system.[...] (2) What a user thinks of as a
> "writing system" is vague enough to cover variations that might be
> regional or language dependent.
That is obvious for me.
>> As for (2), cf.
>> User-Perceived Character. What everyone thinks of as a character in
>> their script.
>> So we have "a user" versus "everyone...in their script" - is the
>> difference intentional? Probably not. Anyway the definitions are
>> language/locale dependent.
> The "everyone" here aims at a shared understanding.
That's also quite obvious for me.
"A user" is grapheme (2) is at least strange.
> This becomes tricky in the case of Abugidas. There's certainly a
> shared understanding that the "unit of writing" is the syllable,
> rather than in individual mark, but the latter do have well-understood
> identities, not least for teaching. That's perhaps the reason why
> there's the handwaving about "minimally distinctive".
> In some scripts like that, users can enter multiple sequences of
> characters that resolve (for all practical purposes) into the same
> syllable. (A big part of that in some scripts is that Unicode does not
> always provide a means to normalize the order of subsidiary signs and
> marks, typically combining marks)
> For some tasks it would be great to have only well-formed syllables;
> but to do that, you would need to add additional interpretation on top
> of the Unicode definitions of a grapheme cluster.
> If you just wrap the raw combining sequences into textels, then some
> tasks might not actually get simpler. Instead of a simple rule that
> determines which alternate orderings of marks are equivalent (to
> account for users not typing them in the preferred order) you would
> have to exhaustively list all combinations and set up equivalent
I would like to know how Swift is handling this. I still have a feeling
that the Swift characters are almost exactly my textels.
Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
More information about the Unicode