1 of 6

RegExp set notation: stage 1?

2 of 6

Add RegExp pattern syntax and semantics for these set operations:

Note that union (in A or in B) is already supported in limited form (only within a single character class).

RegExp set notation

// Matching non-ASCII digits, to convert them to ASCII digits:

[\p{Decimal_Number}--[0-9]]

// → difference/subtraction + nested character class

// Matching spans of “word/identifier letters” of specific scripts:

[\p{Script=Khmer}&&[\p{Letter}\p{Mark}\p{Number}]]

// → intersection + nested character class

// Matching non-script-specific combining marks:

[\p{Nonspacing_Mark}&&[\p{Script=Inherited}\p{Script=Common}]]

// → intersection + nested character class

/…\UnicodeSet{…}…/u

ICU, Java, Perl (experimental), Python regex module, .Net, XML Schema, Xerces, Ruby

language/implementation	union	subtraction	intersection	nested classes	symmetric difference
ICU regex	✅	✅	✅	✅	❌
java.util.regex.Pattern	✅	🤷 *	✅	✅	❌
Perl (“experimental feature available starting in 5.18”)	✅	✅	✅	✅	✅
.Net	✅	✅	❌	✅	❌
XML Schema	✅	✅	❌	✅	❌
Apache Xerces2 XPath regex	✅	✅	✅	✅	❌
Python regex	✅	✅	✅	✅	✅
Ruby	✅	❌	✅	❌	❌
ECMAScript prior to this proposal	✅	❌	❌	❌	❌
ECMAScript with this proposal	✅	✅	✅	✅	❌

RegExp set notation: stage 1?