msg29112 - (view) |
Author: Robert Xiao (nneonneo) * |
Date: 2006-07-09 18:34 |
Using sre.sub[n], an "unmatched group" error can occur.
The test I used is this pattern:
sre.sub("foo(?:b(ar)|baz)","\\1","foobaz")
This will cause the following backtrace to occur:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "lib/python2.4/sre.py", line 142, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "lib/python2.4/sre.py", line 260, in filter
return sre_parse.expand_template(template, match)
File "lib/python2.4/sre_parse.py", line 782, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
Python Version 2.4.3, Mac OS X (behaviour has been verified on
Windows 2.4.3 as well).
This behaviour, while by design, is unwanted because this type of
matching usually requests that a blank match be returned (i.e. the
example should return '')
The example that I was trying resembles the following:
sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data)
The intended behaviour is that the function returns "" when the user is
a guest and the user number if the user is a registered member.
However, when this function encounters a Guest, it raises an exception
and terminates, which is not what is wanted.
Perl and other regex engines behave as I have described, substituting
empty strings for unmatched groups. The code fix is relatively simple,
and would really help out for these types of things.
|
msg29113 - (view) |
Author: Matt Chaput (mchaput) |
Date: 2007-02-15 18:35 |
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception.
|
msg29114 - (view) |
Author: Robert Xiao (nneonneo) * |
Date: 2007-02-17 02:56 |
AFAIK the findall function works as desired in this respect: empty matches will return empty strings.
|
msg58672 - (view) |
Author: Brandon Mintern (BMintern) |
Date: 2007-12-16 12:24 |
This is still a problem which has just given me a headache, because
using re.sub now requires gymnastics instead of just using a simple
string as I did in Perl.
|
msg69541 - (view) |
Author: Gerard (gerardjp) |
Date: 2008-07-11 08:17 |
Hi All,
I found a workaround for the re.sub method so it does not raise an
exception but returns and empty string when backref-ing an empty group.
This is the nutshell:
When doing a search and replace with sub, replace the group represented
as optional for a group represented as an alternation with one empty
subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without
the double quotes).
If there’s nothing matched by this group the empty subexpression
matches. Then an empty string is returned instead of a None and the sub
method is executed normally instead of raising the “unmatched group” error.
A complete description is in my post:
http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/
Regards,
Gerard.
|
msg69558 - (view) |
Author: Brandon Mintern (BMintern) |
Date: 2008-07-11 16:52 |
Looking at your code example, that solution seems quite obvious now, and
I wouldn't even call it a "workaround". Thanks for figuring this out.
Now if I could only remember what code I was using that for...
|
msg78272 - (view) |
Author: Robert Xiao (nneonneo) * |
Date: 2008-12-24 21:30 |
How would I apply that workaround to my example?
re.sub("foo(?:b(ar)|baz)","\\1","foobaz")
|
msg79830 - (view) |
Author: Gerard (gerardjp) |
Date: 2009-01-14 05:21 |
Dear Bobby,
I don't see what would be the part that generates the empty string?
Regards,
Gerard.
|
msg79853 - (view) |
Author: Robert Xiao (nneonneo) * |
Date: 2009-01-14 14:34 |
Well, in this example the group (ar) is unmatched, so sre throws the
error, and because of the alternation, the workaround you mentioned
doesn't seem to directly apply.
A better example is probably
re.sub("foo(?:b(ar)|foo)","\\1","foofoo")
because this can't be simply repaired by refactoring the regex.
The correct behaviour, as I have observed in other regex
implementations, is to replace the group by the empty string; for
example, in Javascript:
>>> 'foobar'.replace(/foo(?:b(ar)|baz)/,'$1')
"ar"
>>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')
""
|
msg81064 - (view) |
Author: Gerard (gerardjp) |
Date: 2009-02-03 15:59 |
Bobby,
Can you post the actual text you need this for? The back ref indeed
returns a None. I'm wondering if the regex can be be simplefied and if a
positive lookbehind could solve this.
Symantically speaking ... If there's a "b" then return the "ar", because
then an empty alternate might again be of help.
Kind regards,
Gerard.
|
msg81118 - (view) |
Author: Robert Xiao (nneonneo) * |
Date: 2009-02-04 00:36 |
It was so long ago, I've since redone half my codebase (the hack is
still there, but I can't remember what it was meant to replace now :( ).
Sorry about that.
|
msg81220 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2009-02-05 19:32 |
This has been addressed in issue #2636.
|
msg81462 - (view) |
Author: Gerard (gerardjp) |
Date: 2009-02-09 16:44 |
Matthew,
Thanx for the heads-up!
Regards,
Gerard.
|
msg108662 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2010-06-26 00:30 |
If I understand "This has been addressed in issue #2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct?
|
msg108669 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2010-06-26 00:58 |
Issue #2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module.
|
msg108670 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2010-06-26 01:09 |
It would be nice if you could port 'pieces' of #2636 to Python, in order to fix this and other bugs (and possibly add more features too).
|
msg155967 - (view) |
Author: Nikki DelRosso (Nikker) |
Date: 2012-03-15 22:02 |
I'm having the same issue as the original author of this issue was. The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional.
I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it. My regular expression looks like so:
These work:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)')
'avatar 2009'
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009) (special edition)')
'avatar 2009'
This doesn't:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "/usr/lib/python2.6/re.py", line 278, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched groupedition)')
Is there some way I can apply this workaround to this situation?
|
msg155969 - (view) |
Author: Nikki DelRosso (Nikker) |
Date: 2012-03-15 22:04 |
Sorry, the non-working command should look as follows:
re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)')
|
msg155982 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2012-03-16 00:59 |
The replacement can be a callable, so you could do this:
re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)')
|
msg155983 - (view) |
Author: Nikki DelRosso (Nikker) |
Date: 2012-03-16 01:08 |
Perfect; thank you!
|
msg227037 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-09-18 10:54 |
Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5.
|
msg228966 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-10-10 08:16 |
New changeset bd2f1ea04025 by Serhiy Storchaka in branch 'default':
Issue 1519638: Now unmatched groups are replaced with empty strings in re.sub()
https://hg.python.org/cpython/rev/bd2f1ea04025
|
msg228969 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-10-10 08:45 |
Thank you for your review Antoine.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:18 | admin | set | github: 43640 |
2014-10-10 08:45:02 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg228969
stage: patch review -> resolved |
2014-10-10 08:16:35 | python-dev | set | nosy:
+ python-dev messages:
+ msg228966
|
2014-10-10 07:50:01 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2014-10-08 20:32:20 | pitrou | set | assignee: effbot -> (no value) |
2014-09-18 10:54:53 | serhiy.storchaka | set | files:
+ re_sub_unmatched_group.patch
type: enhancement components:
+ Library (Lib) versions:
+ Python 3.5, - Python 2.6, Python 2.7 keywords:
+ patch nosy:
+ serhiy.storchaka
messages:
+ msg227037 stage: patch review |
2013-09-16 14:39:27 | THRlWiTi | set | nosy:
+ THRlWiTi
|
2012-03-16 01:08:10 | Nikker | set | messages:
+ msg155983 |
2012-03-16 00:59:59 | mrabarnett | set | messages:
+ msg155982 |
2012-03-15 22:04:12 | Nikker | set | messages:
+ msg155969 |
2012-03-15 22:02:49 | Nikker | set | nosy:
+ Nikker messages:
+ msg155967
|
2010-06-26 01:09:57 | ezio.melotti | set | nosy:
+ ezio.melotti messages:
+ msg108670
|
2010-06-26 00:58:24 | mrabarnett | set | messages:
+ msg108669 |
2010-06-26 00:30:53 | terry.reedy | set | nosy:
+ terry.reedy
messages:
+ msg108662 versions:
- Python 2.5, Python 3.0 |
2009-02-09 16:44:49 | gerardjp | set | messages:
+ msg81462 |
2009-02-05 19:32:55 | mrabarnett | set | nosy:
+ mrabarnett messages:
+ msg81220 |
2009-02-04 00:36:38 | nneonneo | set | messages:
+ msg81118 |
2009-02-03 15:59:47 | gerardjp | set | messages:
+ msg81064 |
2009-01-14 14:34:02 | nneonneo | set | messages:
+ msg79853 versions:
+ Python 2.6, Python 2.5, Python 3.0 |
2009-01-14 05:21:40 | gerardjp | set | messages:
+ msg79830 |
2008-12-24 21:30:42 | nneonneo | set | messages:
+ msg78272 |
2008-09-27 14:39:08 | timehorse | set | versions:
+ Python 2.7, - Python 2.5 |
2008-09-27 14:36:36 | timehorse | set | nosy:
+ timehorse |
2008-07-11 16:52:19 | BMintern | set | messages:
+ msg69558 |
2008-07-11 08:17:20 | gerardjp | set | nosy:
+ gerardjp messages:
+ msg69541 title: Unmatched Group issue -> Unmatched Group issue - workaround |
2007-12-16 12:24:50 | BMintern | set | nosy:
+ BMintern messages:
+ msg58672 |
2006-07-09 18:34:12 | nneonneo | create | |