I think the key phrase is "user-perceived". And you don't need to involve complex scripts either.
For instance as an English-speaking person, I would perceive the "æ" in "encyclopædia" as being two characters (albeit shoved together somewhat). The argument for this is that the word can equally well be rendered as "encyclopaedia".
A Danish or Norwegian speaker, on the other hand, would perceive "æ" (as in "ære" or "æsj!") as being a single indivisible character.
Mark Dalley
-----Original Message-----
From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Janusz S. Bien
Sent: 19 September 2016 07:40
To: Christoph Päper
Cc: unicode Unicode Discussion
Subject: graphemes (was: "textels")
On Sun, Sep 18 2016 at 21:40 CEST, christoph.paeper_at_crissov.de writes:
> Janusz S. Bien <jsbien_at_mimuw.edu.pl>:
>>
>> From the Unicode glossary:
>>
>>> Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system.[...] (2) What a user thinks of as a character.
>>
>>> User-Perceived Character. What everyone thinks of as a character in their script.
>>
>> […] the definitions are language/locale dependent.
>
> A writing system is (usually) language-dependent, a script is not,
> although some scripts have been used exclusively (or prominently) in a
> single writing system with a single language.
It depends of course what do you mean exactly by script, and which meaning of term is intended in the definition of User-Perceived Character. But "a user" is definitely language/locale dependent :-)
> So definition (1) of ‘grapheme’ would be appropriate for linguistics,
> (2) maybe for typography and computer science, but it’Í extremely
> vague.
I think that 'grapheme' (2) in the present wording is simply incorrect. I suspect it is not used in the standard at all.
Searching the Unicode site I found only one use of 'grapheme' alone:
http://www.unicode.org/L2/L2000/00274-N2236-grapheme-joiner.htm
Graphemes are sequences of one or more encoded characters that
correspond to what users think of as characters.
I guess the intention of 'grapheme' (2) was to describe it without any reference to computer encoding, which is definitely an extremely difficult task.
Best regards
Janusz
-- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsbien@uw.edu.pl, jsbien@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/ -----Original Message----- From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Janusz S. Bien Sent: 19 September 2016 07:40 To: Christoph Päper Cc: unicode Unicode Discussion Subject: graphemes (was: "textels") On Sun, Sep 18 2016 at 21:40 CEST, christoph.paeper_at_crissov.de writes: > Janusz S. Bien <jsbien_at_mimuw.edu.pl>: >> >> From the Unicode glossary: >> >>> Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system.[...] (2) What a user thinks of as a character. >> >>> User-Perceived Character. What everyone thinks of as a character in their script. >> >> […] the definitions are language/locale dependent. > > A writing system is (usually) language-dependent, a script is not, > although some scripts have been used exclusively (or prominently) in a > single writing system with a single language. It depends of course what do you mean exactly by script, and which meaning of term is intended in the definition of User-Perceived Character. But "a user" is definitely language/locale dependent :-) > So definition (1) of ‘grapheme’ would be appropriate for linguistics, > (2) maybe for typography and computer science, but it’Í extremely > vague. I think that 'grapheme' (2) in the present wording is simply incorrect. I suspect it is not used in the standard at all. Searching the Unicode site I found only one use of 'grapheme' alone: http://www.unicode.org/L2/L2000/00274-N2236-grapheme-joiner.htm Graphemes are sequences of one or more encoded characters that correspond to what users think of as characters. I guess the intention of 'grapheme' (2) was to describe it without any reference to computer encoding, which is definitely an extremely difficult task. Best regards Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsbien@uw.edu.pl, jsbien@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/Received on Mon Sep 19 2016 - 03:47:08 CDT
This archive was generated by hypermail 2.2.0 : Mon Sep 19 2016 - 03:47:08 CDT