Issue9179
Created on 2010-07-06 10:23 by acooke, last changed 2010-07-06 16:02 by acooke.
| Messages (8) | |||
|---|---|---|---|
| msg109382 - (view) | Author: andrew cooke (acooke) | Date: 2010-07-06 10:23 | |
from re import compile
# these work as expected
assert compile('(a)b(?<=b)(c)').match('abc')
assert not compile('(a)b(?<=c)(c)').match('abc')
assert compile('(a)b(?=c)(c)').match('abc')
assert not compile('(a)b(?=b)(c)').match('abc')
# but when you add groups, you get bugs
assert not compile('(?:(a)|(x))b(?<=(?(2)x|c))c').match('abc') # matches!
assert not compile('(?:(a)|(x))b(?<=(?(2)b|x))c').match('abc')
assert compile('(?:(a)|(x))b(?<=(?(2)x|b))c').match('abc') # fails!
assert not compile('(?:(a)|(x))b(?<=(?(1)c|x))c').match('abc') # matches!
assert compile('(?:(a)|(x))b(?<=(?(1)b|x))c').match('abc') # fails!
# but lookahead works as expected
assert compile('(?:(a)|(x))b(?=(?(2)x|c))c').match('abc')
assert not compile('(?:(a)|(x))b(?=(?(2)c|x))c').match('abc')
assert compile('(?:(a)|(x))b(?=(?(2)x|c))c').match('abc')
assert not compile('(?:(a)|(x))b(?=(?(1)b|x))c').match('abc')
assert compile('(?:(a)|(x))b(?=(?(1)c|x))c').match('abc')
# these are similar but, in my opinion, shouldn't even compile
# (group used before defined)
assert not compile('(a)b(?<=(?(2)x|c))(c)').match('abc') # matches!
assert not compile('(a)b(?<=(?(2)b|x))(c)').match('abc')
assert not compile('(a)b(?<=(?(1)c|x))(c)').match('abc') # matches!
assert compile('(a)b(?<=(?(1)b|x))(c)').match('abc') # fails!
assert compile('(a)b(?=(?(2)x|c))(c)').match('abc')
assert not compile('(a)b(?=(?(2)b|x))(c)').match('abc')
assert compile('(a)b(?=(?(1)c|x))(c)').match('abc')
# this is the error we should see above
try:
compile('(a)\\2(b)')
assert False, 'expected error'
except:
pass
|
|||
| msg109383 - (view) | Author: andrew cooke (acooke) | Date: 2010-07-06 10:30 | |
I hope the above is clear enough (you need to stare at the regexps for a time) - basically, lookback with a group conditional is not as expected (it appears to be evaluated as lookahead?). Also, some patterns compile that probably shouldn't. The re package only supports (according to the docs) lookback on expressions whose length is known. So I guess it's also possible that (?(n)pat1|pat2) should always fail that, even when len(pat1) = len(pat2)? Also, the generally excellent unit tests for the re package don't have much coverage for lookback (I am writing my own regexp lib and it passes all the re unit tests but had a similar bug - that's how I found this one...). |
|||
| msg109387 - (view) | Author: andrew cooke (acooke) | Date: 2010-07-06 13:08 | |
If it's any help, these are the equivalent tests as I think they should be (you'll need to translate engine(parse(... to compile(...) http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/engine/backtrack/_test/engine.py?r=fc52f6959a0cfabdddc6960f47d7380128bb3584#284 |
|||
| msg109388 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2010-07-06 13:30 | |
Thanks very much for the reports. > So I guess it's also possible that (?(n)pat1|pat2) should always fail > that, even when len(pat1) = len(pat2)? Yes, this seems likely to me. Possibly even the compile stage should fail, though I've no idea how feasible it is to make that happen. Unfortunately I'm not sure that any of the currently active Python developers is particularly well versed in the intricacies of the re module. The most realistic option here may be just to document the restrictions on lookbehind assertions more clearly. Unless you're able to provide a patch? |
|||
| msg109389 - (view) | Author: andrew cooke (acooke) | Date: 2010-07-06 13:47 | |
I thought someone was working on the re module these days? I thought there I'd seen some issues with patches etc? Anyway, short term, sorry - no patch. Medium/long term, yes it's possible, but please don't rely on it. The simplest way to document it is as you suggest, I think - just extend the qualifier on lookback requiring fixed length to exclude references to groups (it does seem to *bind* groups correctly on lookback, so there's no need to exclude them completely). |
|||
| msg109390 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2010-07-06 13:56 | |
> I thought someone was working on the re module these days? Well, there's issue 2636. It doesn't seem likely that that work will land in core Python any time soon, though. |
|||
| msg109399 - (view) | Author: Matthew Barnett (mrabarnett) | Date: 2010-07-06 15:52 | |
Should a regex compile if a group is referenced before it's defined?
Consider this:
(?:(?(2)(a)|(b))+
Other regex implementations permit forward references to groups.
BTW, I had a look at the re module, found it too difficult, and so started on my own implementation of the matching engine (available on PyPI).
|
|||
| msg109400 - (view) | Author: andrew cooke (acooke) | Date: 2010-07-06 16:02 | |
Ah good point, thanks. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-07-06 16:02:41 | acooke | set | messages: + msg109400 |
| 2010-07-06 15:52:05 | mrabarnett | set | messages: + msg109399 |
| 2010-07-06 13:56:16 | mark.dickinson | set | messages: + msg109390 |
| 2010-07-06 13:47:53 | acooke | set | messages: + msg109389 |
| 2010-07-06 13:31:43 | mark.dickinson | set | versions: + Python 3.1, Python 2.7, Python 3.2 |
| 2010-07-06 13:30:20 | mark.dickinson | set | nosy:
+ mark.dickinson, mrabarnett messages: + msg109388 |
| 2010-07-06 13:08:29 | acooke | set | messages: + msg109387 |
| 2010-07-06 10:30:28 | acooke | set | messages: + msg109383 |
| 2010-07-06 10:23:32 | acooke | create | |
