> On Oct 6, 2015, at 6:04 , Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> In those conditions, normalizing the Java string will leave those lone surrogates (and non-characters) as is, or will throw an exception, depending on the API used. Java strings do not have any implied encoding (their "char" members are also unrestricted 16-bit code units, they have some basic properties but only in BMP, defined in the builtin Character class API: properties for non-BMP characters require using a library to provide them, such as ICU4J).
The Java Character class was enhanced in J2SE 5.0 to support supplementary characters. The String class was specified to be based on UTF-16, and string processing throughout the platform was updated to support supplementary characters based on UTF-16. These changes have been available to the public since 2004. For a summary, see
http://www.oracle.com/technetwork/articles/java/supplementary-142654.html
Norbert
Received on Tue Oct 06 2015 - 12:40:11 CDT
This archive was generated by hypermail 2.2.0 : Tue Oct 06 2015 - 12:40:11 CDT