Transform Rule Syntax

Philippe Verdy verdy_p at
Wed Dec 16 18:54:31 CST 2015

When a dash-hyphen "-" appears as the first character within an inclusive
(or negative) character class, just after "[" (or after "[^" in a negative
class), it does not denote a range separator, but itself literally as being
part of the inclusive character class (or being excludedfrom the negative
This is how most regexp engines treat it, and you don't need to escape it
(with a "\").

So "[-\ ]" is the character class containing only the dash-hyphen and the
space (which needs to be escaped in CLDR rules because whitespaces are
relaxed, as you noted), and it has NO range.
e-mail a été envoyé depuis un ordinateur protégé par Avast.

2015-12-17 1:25 GMT+01:00 Cameron Dutro <cameron at>:

> Hey cldr-users,
> I'm working with the CLDR transform rules and finding myself flummoxed.
> Specifically I'm looking at this rule
> <>
> in the es-es_FONIPA transform rule set. In this rule, we see what appears
> to be a Unicode set or character class from a regular expression: [-\ ]
> Either way, this does not appear to be valid syntax. Hyphens are used in
> character classes to denote ranges of characters, for example [a-z].
> Literal hyphens must be escaped. The hyphen in question is neither part of
> a range nor escaped. Why is this? Finally, it appears the character class
> contains an escaped space character. Space characters are not required to
> be escaped in character classes.
> My suspicion is that this syntax is to be treated in a special way since
> it is used in the context of transformation rules. Please let me know if
> this is the case. I have been unable to find any documentation regarding
> the special treatment of hyphens in UTS #35 or other documents.
> Thanks!
> -Cameron
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the CLDR-Users mailing list