RE: Unicode and Java Questions

From: Phillips, Addison (addison@amazon.com)
Date: Thu Oct 02 2008 - 19:46:39 CDT

Next message: John W Kennedy: "Re: Unicode and Java Questions"

Previous message: Matt Chu: "Re: Unicode and Java Questions"
In reply to: Matt Chu: "Re: Unicode and Java Questions"
Next in thread: John W Kennedy: "Re: Unicode and Java Questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

1) There DOES exist language-dependent string equivalence, as well as Java's built-in language-independent string equivalence. That is, the follow situation exists:

x = "\uXXXX";
y = "\uYYYY";
if (locale == A) then x == y else x != y

No. What we’re saying is you can have:

Collator col = Collator.getInstance(locale);
if (x.equals(y)) {
assert col.compare(x, y)==0 : “this never throws an assertion error”;
}
assert col.compare(x, y)!=0 : “this can throw an assertion error because some unequal strings compare as equal”;
}

The “equals” and “compareTo” methods in String are NOT locale sensitive. See http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#compareTo(java.lang.String)

2) Given that (1) is true and .equals changes based on locale, then doesn't that mean I have to override .hashCode in order to maintain the Java

(1) Is false…..

Map<String, Boolean> map = new HashMap<String, Boolean>(Locale.GERMAN);
map.put("STRASSE", true);
map.put("STRAßE", true);
System.out.println("size = " + map.size()); // I want this to print ONE, not two

Those strings are not equal. :-)

3) So I know that there exists some values locale1, locale2, and s such that:

Locale locale1 = ...;
Locale locale2 = ...;
String s = "...";
s.toLowerCase(locale1) != s.toLowerCase(locale2)

is true.

And I know that .toLowerCase()/.toUpperCase() is inherently language-dependent, where the locale is inferred from the JVM/environment.
That’s correct, although we would say “locale-dependent”.

I'm trying to ask if *language-independent* case *conversions* (not case-folding) exists. That is:

s.toLowerCase(Locale.NULL)

or something like that. I guess I'm not sure on how to use the algorithms for case-folding with case conversion, and whether or not its even appropriate. If case conversion is not appropriate, would I be correct in that the right way to do it is to wrap string in ICU4J's CaseInsensitiveString class?
Sure: there is a default case-folding. Locale.ROOT (in 1.6 or later) or “new Locale(“”,””)” gives you this default case mapping.

Also, I'm on JDK5, so I don't have Locale.ROOT, but I don't fully understand what new Locale("") does in toUpperCase/toLowerCase; is this the language-independent case conversion I'm looking for?
The locale with the empty strings in the constructor (called the root locale colloquially) is something like the C/POSIX locale is in C/C++. It is an English-like locale with mostly default behavior (where default behavior exists). It is also the locale where your root (source) resource bundles live.
You also might want to take this thread to i18n-prog@yahoogroups.com<mailto:i18n-prog@yahoogroups.com> (where internationalization programming is the topic of the day).
Regards,
Addison

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization Core WG

Internationalization is not a feature.
It is an architecture.

--
Naoto Sato

Next message: John W Kennedy: "Re: Unicode and Java Questions"
Previous message: Matt Chu: "Re: Unicode and Java Questions"
In reply to: Matt Chu: "Re: Unicode and Java Questions"
Next in thread: John W Kennedy: "Re: Unicode and Java Questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 02 2008 - 19:49:53 CDT