Unicode Sets in 'Unicode Regular Expressions'
addison at lab126.com
Tue May 27 17:36:04 CDT 2014
A "Unicode set" in this context means "a set of code points". This is discussed in section 1.2:
This is done by providing syntax for sets of characters based on the Unicode character properties, and allowing them to be mixed with lists and ranges of individual code points.
More generally, there is no term "Unicode set" defined, although is it referred to in places such as RL1.3 as a shorthand. It merely means "the set of all code points selected" (by whatever selection, subtraction, intersection, or differencing has been applied beginning from the Universal Character Set as a whole). Or at least this is how I have already read it.
> -----Original Message-----
> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Richard
> Sent: Tuesday, May 27, 2014 3:18 PM
> To: unicode at unicode.org
> Subject: Unicode Sets in 'Unicode Regular Expressions'
> UTS#18 'Unicode Regular Expressions' Version 17 Requirement RL1.3
> 'Subtraction and Intersection' talks of Unicode sets. What is the relevant
> definition of a 'Unicode set'? Is it a finite set of non-empty strings? Other
> possibilities that occur to me, depending on context, include sets of codepoints
> and sets of indecomposable codepoints.
> Unicode mailing list
> Unicode at unicode.org
More information about the Unicode