Re: Unicode String Models from Daniel Bünzli via Unicode on 2018-09-09 (Unicode Mail List Archive)

From: Daniel Bünzli via Unicode <unicode_at_unicode.org>
Date: Sun, 9 Sep 2018 15:42:19 +0200

Hello,

I find your notion of "model" and presentation a bit confusing since it conflates what I would call the internal representation and the API.

The internal representation defines how the Unicode text is stored and should not really matter to the end user of the string data structure. The API defines how the Unicode text is accessed, expressed by what is the result of an indexing operation on the string. The latter is really what matters for the end-user and what I would call the "model".

I think the presentation would benefit from making a clear distinction between the internal representation and the API; you could then easily summarize them in a table which would make a nice summary of the design space.

I also think you are missing one API which is the one with ECG I would favour: indexing returns Unicode scalar values, internally be it whatever you wish UTF-{8,16,32} or a custom encoding. Maybe that's what you intended by the "Code Point Model: Internal 8/16/32" but that's not what it says, the distinction between code point and scalar value is an important one and I think it would be good to insist on it to clarify the minds in such documents.

Best,

Daniel
Received on Sun Sep 09 2018 - 08:42:47 CDT

This archive was generated by hypermail 2.2.0 : Sun Sep 09 2018 - 08:42:49 CDT