Re: "A Programmer's Introduction to Unicode"

From: Alastair Houghton <alastair_at_alastairs-place.net>
Date: Mon, 13 Mar 2017 19:18:00 +0000

On 13 Mar 2017, at 17:55, J Decker <d3ck0r_at_gmail.com> wrote:
>
> I liked the Go implementation of character type - a rune type - which is a codepoint. and strings that return runes from by index.
> https://blog.golang.org/strings

IMO, returning code points by index is a mistake. It over-emphasises the importance of the code point, which helps to continue the notion in some developers’ minds that code points are somehow “characters”. It also leads to people unnecessarily using UCS-4 as an internal representation, which seems to have very few advantages in practice over UTF-16.

> Doesn't solve the problem for composited codepoints though...
>
> texel looks to be defined as a graphic element already. TEXture ELement.

Yes, but I thought the proposal was “textel”, with the extra “t”. Re-using “texel” would be quite inappropriate; there are certainly people who work on rendering software who would strongly object to that, for very good reasons.

I would caution, however, that there’s already a lot of terminology associated with Unicode, perhaps for understandable reasons, but if the word “textel” is going to have a definition that differs from (say) an extended grapheme cluster, I think a great deal of consideration should be given to what exactly that definition should be. We already have “characters”, code units, code points, combining sequences, graphemes, grapheme clusters, extended grapheme clusters and probably other things I’ve missed off that list. Merely adding yet another bit of terminology isn’t going to fix the problem of developers misunderstanding or simply not being aware of the correct terminology or of some aspect of Unicode’s behaviour.

Kind regards,

Alastair.

--
http://alastairs-place.net
Received on Mon Mar 13 2017 - 14:18:30 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 13 2017 - 14:18:30 CDT