From: spir (denis.spir@free.fr)
Date: Sun Jan 24 2010 - 05:30:44 CST
On Sat, 23 Jan 2010 20:57:31 +0100
spir <denis.spir@free.fr> wrote:
> Hello,
>
> What is the maximum number of codes, if any?, that together compose a single character?
Thank you for your answers. There seems to be no definite limit, which is all what I needed to know.
@ Michael d'Errico: Sorry for the ambiguity, I was refereng to composed (or rather decomposed) characters. As "stacked" (grouped) into "user-perceived characters" or "grapheme clusters" by algorithms such as the first one defined in UAX29, text segmentationat http://www.unicode.org/reports/tr29/.
I'm surprised that compatibility decomposition produce more decomposed "code stacks" that equivalent decomposition.
@ Mark Davis: This is precisely after reading and starting to implement UAX29 I asked this question. Indeed, "character" is problematic. Many docs from the unicode consortium use it rather freely compared to the def of abstract character, often where I'd rather expect "code" (esp. for algorithm definition; as I understand it, a prog can only manipulate codes or groups of them, for the reason _character_ is an abstract notion).
@ Chris Fynn: Thank you for the example of HAKṢHMALAWARAYAṀ.
This question was triggered by an idea about the representation of (possibly composed) characters, considered as the "atomic unit" for text processing.
Basically (and in unicode docs), it is a kind of group/sequence/stack of codes that will indeed often happen to be a singleton. Let's say a character is _formed_ by a stack of codes. A trick may be to represent a character formed by eg (c1 c2 c3) as a bigger int C = c1*N*N + c2*N + c3 --where N > 10ffff.
This indeed makes big ints! (max_width = max_code_number * log2(N), I guess) I wanted to know the limit of this "bigness", if any. So, I need arbitrary wide ints.
I may send a post on the topic when I'm clearer with the idea.
Denis
________________________________
la vita e estrany
This archive was generated by hypermail 2.1.5 : Sun Jan 24 2010 - 05:34:29 CST