Author mrabarnett
Recipients akitada, akoumjian, alex, amaury.forgeotdarc, belopolsky, davide.rizzo, eric.snow, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jacques, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, ronnix, rsc, sjmachin, steven.daprano, stiv, timehorse, vbr, zdwiel
Date 2011-09-01.17:50:49
SpamBayes Score 1.55817e-05
Marked as misclassified No
Message-id <1314899450.27.0.345702317902.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
The regex module supports nested sets and set operations, eg. r"[[a-z]--[aeiou]]" (the letters from 'a' to 'z', except the vowels). This means that literal '[' in a set needs to be escaped.

For example, re module sees "[][()]..." as:

    [      start of set
     ]     literal ']'
     [()   literals '[', '(', ')'
    ]      end of set
    ...   ...

but the regex module sees it as:

    [      start of set
     ]     literal ']'
     [()]  nested set [()]
     ...   ...

Thus:

>>> s = u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub(r'(?<=[][()]) |(?!,) (?!\[,)(?=[][(),])', '', s)
u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub('(?<=[]\[()]) |(?!,) (?!\[,)(?=[]\[(),])', '', s)
u'void foo(type arg1 [, type arg2])'

If it can't parse it as a nested set, it tries again as a non-nested set (like re), but there are bound to be regexes where it could be either.
History
Date User Action Args
2011-09-01 17:50:50mrabarnettsetrecipients: + mrabarnett, loewis, georg.brandl, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, belopolsky, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, steven.daprano, alex, r.david.murray, jacques, zdwiel, jhalcrow, stiv, davide.rizzo, ronnix, eric.snow, akoumjian
2011-09-01 17:50:50mrabarnettsetmessageid: <1314899450.27.0.345702317902.issue2636@psf.upfronthosting.co.za>
2011-09-01 17:50:49mrabarnettlinkissue2636 messages
2011-09-01 17:50:49mrabarnettcreate