classification
Title: Unmatched Group issue - workaround
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: BMintern, Nikker, effbot, ezio.melotti, gerardjp, irdb, mchaput, mrabarnett, nneonneo, python-dev, serhiy.storchaka, terry.reedy, timehorse
Priority: normal Keywords: patch

Created on 2006-07-09 18:34 by nneonneo, last changed 2014-10-10 08:45 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
re_sub_unmatched_group.patch serhiy.storchaka, 2014-09-18 10:54 review
Messages (23)
msg29112 - (view) Author: Robert Xiao (nneonneo) * Date: 2006-07-09 18:34
Using sre.sub[n], an "unmatched group" error can occur.

The test I used is this pattern:

sre.sub("foo(?:b(ar)|baz)","\\1","foobaz")

This will cause the following backtrace to occur:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "lib/python2.4/sre.py", line 260, in filter
    return sre_parse.expand_template(template, match)
  File "lib/python2.4/sre_parse.py", line 782, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched group

Python Version 2.4.3, Mac OS X (behaviour has been verified on 
Windows 2.4.3 as well).

This behaviour, while by design, is unwanted because this type of 
matching usually requests that a blank match be returned (i.e. the 
example should return '')

The example that I was trying resembles the following:

sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data)

The intended behaviour is that the function returns "" when the user is 
a guest and the user number if the user is a registered member.

However, when this function encounters a Guest, it raises an exception 
and terminates, which is not what is wanted.

Perl and other regex engines behave as I have described, substituting 
empty strings for unmatched groups. The code fix is relatively simple, 
and would really help out for these types of things.
msg29113 - (view) Author: Matt Chaput (mchaput) Date: 2007-02-15 18:35
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception.
msg29114 - (view) Author: Robert Xiao (nneonneo) * Date: 2007-02-17 02:56
AFAIK the findall function works as desired in this respect: empty matches will return empty strings.
msg58672 - (view) Author: Brandon Mintern (BMintern) Date: 2007-12-16 12:24
This is still a problem which has just given me a headache, because
using re.sub now requires gymnastics instead of just using a simple
string as I did in Perl.
msg69541 - (view) Author: Gerard (gerardjp) Date: 2008-07-11 08:17
Hi All,

I found a workaround for the re.sub method so it does not raise an
exception but returns and empty string when backref-ing an empty group.

This is the nutshell:

When doing a search and replace with sub, replace the group represented
as optional for a group represented as an alternation with one empty
subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without
the double quotes).

If there’s nothing matched by this group the empty subexpression
matches. Then an empty string is returned instead of a None and the sub
method is executed normally instead of raising the “unmatched group” error.

A complete description is in my post:
http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/


Regards,

Gerard.
msg69558 - (view) Author: Brandon Mintern (BMintern) Date: 2008-07-11 16:52
Looking at your code example, that solution seems quite obvious now, and
I wouldn't even call it a "workaround". Thanks for figuring this out.
Now if I could only remember what code I was using that for...
msg78272 - (view) Author: Robert Xiao (nneonneo) * Date: 2008-12-24 21:30
How would I apply that workaround to my example?

re.sub("foo(?:b(ar)|baz)","\\1","foobaz")
msg79830 - (view) Author: Gerard (gerardjp) Date: 2009-01-14 05:21
Dear Bobby,

I don't see what would be the part that generates the empty string?

Regards,

Gerard.
msg79853 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-01-14 14:34
Well, in this example the group (ar) is unmatched, so sre throws the
error, and because of the alternation, the workaround you mentioned
doesn't seem to directly apply.

A better example is probably
re.sub("foo(?:b(ar)|foo)","\\1","foofoo")
because this can't be simply repaired by refactoring the regex.

The correct behaviour, as I have observed in other regex
implementations, is to replace the group by the empty string; for
example, in Javascript:
>>> 'foobar'.replace(/foo(?:b(ar)|baz)/,'$1')
"ar"
>>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')
""
msg81064 - (view) Author: Gerard (gerardjp) Date: 2009-02-03 15:59
Bobby,

Can you post the actual text you need this for? The back ref indeed
returns a None. I'm wondering if the regex can be be simplefied and if a
positive lookbehind could solve this.

Symantically speaking ... If there's a "b" then return the "ar", because
then an empty alternate might again be of help.

Kind regards,

Gerard.
msg81118 - (view) Author: Robert Xiao (nneonneo) * Date: 2009-02-04 00:36
It was so long ago, I've since redone half my codebase (the hack is
still there, but I can't remember what it was meant to replace now :( ).

Sorry about that.
msg81220 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-02-05 19:32
This has been addressed in issue #2636.
msg81462 - (view) Author: Gerard (gerardjp) Date: 2009-02-09 16:44
Matthew,

Thanx for the heads-up!

Regards,

Gerard.
msg108662 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-26 00:30
If I understand "This has been addressed in issue #2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct?
msg108669 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2010-06-26 00:58
Issue #2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module.
msg108670 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-06-26 01:09
It would be nice if you could port 'pieces' of #2636 to Python, in order to fix this and other bugs (and possibly add more features too).
msg155967 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-15 22:02
I'm having the same issue as the original author of this issue was.  The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional. 

I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it.  My regular expression looks like so:

These work:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)')
'avatar 2009'
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009) (special edition)')
'avatar 2009'

This doesn't:
>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib/python2.6/re.py", line 278, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched groupedition)')


Is there some way I can apply this workaround to this situation?
msg155969 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-15 22:04
Sorry, the non-working command should look as follows:

re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)')
msg155982 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2012-03-16 00:59
The replacement can be a callable, so you could do this:

re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)')
msg155983 - (view) Author: Nikki DelRosso (Nikker) Date: 2012-03-16 01:08
Perfect; thank you!
msg227037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 10:54
Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5.
msg228966 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-10 08:16
New changeset bd2f1ea04025 by Serhiy Storchaka in branch 'default':
Issue 1519638: Now unmatched groups are replaced with empty strings in re.sub()
https://hg.python.org/cpython/rev/bd2f1ea04025
msg228969 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-10 08:45
Thank you for your review Antoine.
History
Date User Action Args
2014-10-10 08:45:02serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg228969

stage: patch review -> resolved
2014-10-10 08:16:35python-devsetnosy: + python-dev
messages: + msg228966
2014-10-10 07:50:01serhiy.storchakasetassignee: serhiy.storchaka
2014-10-08 20:32:20pitrousetassignee: effbot -> (no value)
2014-09-18 10:54:53serhiy.storchakasetfiles: + re_sub_unmatched_group.patch

type: enhancement
components: + Library (Lib)
versions: + Python 3.5, - Python 2.6, Python 2.7
keywords: + patch
nosy: + serhiy.storchaka

messages: + msg227037
stage: patch review
2013-09-16 14:39:27irdbsetnosy: + irdb
2012-03-16 01:08:10Nikkersetmessages: + msg155983
2012-03-16 00:59:59mrabarnettsetmessages: + msg155982
2012-03-15 22:04:12Nikkersetmessages: + msg155969
2012-03-15 22:02:49Nikkersetnosy: + Nikker
messages: + msg155967
2010-06-26 01:09:57ezio.melottisetnosy: + ezio.melotti
messages: + msg108670
2010-06-26 00:58:24mrabarnettsetmessages: + msg108669
2010-06-26 00:30:53terry.reedysetnosy: + terry.reedy

messages: + msg108662
versions: - Python 2.5, Python 3.0
2009-02-09 16:44:49gerardjpsetmessages: + msg81462
2009-02-05 19:32:55mrabarnettsetnosy: + mrabarnett
messages: + msg81220
2009-02-04 00:36:38nneonneosetmessages: + msg81118
2009-02-03 15:59:47gerardjpsetmessages: + msg81064
2009-01-14 14:34:02nneonneosetmessages: + msg79853
versions: + Python 2.6, Python 2.5, Python 3.0
2009-01-14 05:21:40gerardjpsetmessages: + msg79830
2008-12-24 21:30:42nneonneosetmessages: + msg78272
2008-09-27 14:39:08timehorsesetversions: + Python 2.7, - Python 2.5
2008-09-27 14:36:36timehorsesetnosy: + timehorse
2008-07-11 16:52:19BMinternsetmessages: + msg69558
2008-07-11 08:17:20gerardjpsetnosy: + gerardjp
messages: + msg69541
title: Unmatched Group issue -> Unmatched Group issue - workaround
2007-12-16 12:24:50BMinternsetnosy: + BMintern
messages: + msg58672
2006-07-09 18:34:12nneonneocreate