L2/09-348 Source

L2/09-348
Source: Mark Davis
Subject: Recommended Unicode escaping mechanism
Date: October 20, 2009

====

Martin Duerst talked about some nice syntax for escaping Unicode characters that is used in Ruby, to wit:

\uXXXX works as common in Java, UTS18, etc.
\u{...} takes any sequence of space-delimited hex values, so \u{61 308} == \u0061\u0308

This has a number of good features; it can be more compact than simply using the \u notation, and it consistently handles supplemental characters. For example, take the string containing the two characters:


	
	U+12000

( 𒀀 ) CUNEIFORM SIGN A


	
	U+12001

( 𒀁 ) CUNEIFORM SIGN A TIMES A

This can be represented in Ruby's notation as \u{12000 12001} instead of resorting to other notation like \U0012000\U0012001.

I'd like to discuss recommending this notation in UTS #18 and other appropriate places.

Mark