L2/09-348Source: Mark Davis
Subject:
Recommended Unicode escaping mechanism
Date: October 20, 2009
====
Martin Duerst talked about some nice syntax for escaping
Unicode characters that is used in Ruby, to wit:
- \uXXXX works as common in Java, UTS18, etc.
- \u{...} takes any sequence of space-delimited hex values, so \u{61
308} == \u0061\u0308
This has a number of good features; it can be more compact than simply using
the \u notation, and it consistently handles supplemental characters. For
example, take the string containing the two characters:
U+12000
( 𒀀 ) CUNEIFORM SIGN A
U+12001
( 𒀁 ) CUNEIFORM SIGN A TIMES A
This can be
represented in Ruby's notation as \u{12000 12001} instead of resorting to
other notation like \U0012000\U0012001.
I'd like to discuss
recommending this notation in UTS #18 and other appropriate places.
Mark