Re: 3rd-party cross-platform UTF-8 support

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Sep 24 2001 - 15:50:56 EDT

Previous message: Michael \(michka\) Kaplan: "Re: UTF-8 <> UCS-2/UTF-16 conversion for library use"
In reply to: Tom Emerson: "RE: 3rd-party cross-platform UTF-8 support"
Next in thread: Tom Emerson: "Re: 3rd-party cross-platform UTF-8 support"
Next in thread: Michael \(michka\) Kaplan: "Re: 3rd-party cross-platform UTF-8 support"
Reply: Tom Emerson: "Re: 3rd-party cross-platform UTF-8 support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Tom Emerson" <tree@basistech.com>

> But if I have a text string, and that string is encoded in UTF-16, and
> I want to access Unicode character values, then I cannot index that
> string in constant time.
>
> To find character n I have to walk all of the 16-bit values in that
> string accounting for surrogates. If I use UTF-32 I don't need to do
> that. This very issue came up during the discussion of how to handle
> surrogates in Python.

Would this not be the same issue for composite characters, even *in* UTF-32?
If you truly mean to work with characters here then it seems this is a
problem you can always have.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Previous message: Michael \(michka\) Kaplan: "Re: UTF-8 <> UCS-2/UTF-16 conversion for library use"
In reply to: Tom Emerson: "RE: 3rd-party cross-platform UTF-8 support"
Next in thread: Tom Emerson: "Re: 3rd-party cross-platform UTF-8 support"
Next in thread: Michael \(michka\) Kaplan: "Re: 3rd-party cross-platform UTF-8 support"
Reply: Tom Emerson: "Re: 3rd-party cross-platform UTF-8 support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Sep 24 2001 - 14:42:37 EDT