classification
Title: Unmatched Group issue - workaround
Type: Stage:
Components: Regular Expressions Versions: Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: effbot Nosy List: BMintern, effbot, ezio.melotti, gerardjp, mchaput, mrabarnett, nneonneo, terry.reedy, timehorse
Priority: normal Keywords:

Created on 2006-07-09 18:34 by nneonneo, last changed 2010-06-26 01:09 by ezio.melotti.

Messages (16)
msg29112 - (view) Author: Robert Xiao (nneonneo) Date: 2006-07-09 18:34
Using sre.sub[n], an "unmatched group" error can occur.

The test I used is this pattern:

sre.sub("foo(?:b(ar)|baz)","\\1","foobaz")

This will cause the following backtrace to occur:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "lib/python2.4/sre.py", line 260, in filter
    return sre_parse.expand_template(template, match)
  File "lib/python2.4/sre_parse.py", line 782, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched group

Python Version 2.4.3, Mac OS X (behaviour has been verified on 
Windows 2.4.3 as well).

This behaviour, while by design, is unwanted because this type of 
matching usually requests that a blank match be returned (i.e. the 
example should return '')

The example that I was trying resembles the following:

sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data)

The intended behaviour is that the function returns "" when the user is 
a guest and the user number if the user is a registered member.

However, when this function encounters a Guest, it raises an exception 
and terminates, which is not what is wanted.

Perl and other regex engines behave as I have described, substituting 
empty strings for unmatched groups. The code fix is relatively simple, 
and would really help out for these types of things.
msg29113 - (view) Author: Matt Chaput (mchaput) Date: 2007-02-15 18:35
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception.
msg29114 - (view) Author: Robert Xiao (nneonneo) Date: 2007-02-17 02:56
AFAIK the findall function works as desired in this respect: empty matches will return empty strings.
msg58672 - (view) Author: Brandon Mintern (BMintern) Date: 2007-12-16 12:24
This is still a problem which has just given me a headache, because
using re.sub now requires gymnastics instead of just using a simple
string as I did in Perl.
msg69541 - (view) Author: Gerard (gerardjp) Date: 2008-07-11 08:17
Hi All,

I found a workaround for the re.sub method so it does not raise an
exception but returns and empty string when backref-ing an empty group.

This is the nutshell:

When doing a search and replace with sub, replace the group represented
as optional for a group represented as an alternation with one empty
subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without
the double quotes).

If there’s nothing matched by this group the empty subexpression
matches. Then an empty string is returned instead of a None and the sub
method is executed normally instead of raising the “unmatched group” error.

A complete description is in my post:
http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/


Regards,

Gerard.
msg69558 - (view) Author: Brandon Mintern (BMintern) Date: 2008-07-11 16:52
Looking at your code example, that solution seems quite obvious now, and
I wouldn't even call it a "workaround". Thanks for figuring this out.
Now if I could only remember what code I was using that for...
msg78272 - (view) Author: Robert Xiao (nneonneo) Date: 2008-12-24 21:30
How would I apply that workaround to my example?

re.sub("foo(?:b(ar)|baz)","\\1","foobaz")
msg79830 - (view) Author: Gerard (gerardjp) Date: 2009-01-14 05:21
Dear Bobby,

I don't see what would be the part that generates the empty string?

Regards,

Gerard.
msg79853 - (view) Author: Robert Xiao (nneonneo) Date: 2009-01-14 14:34
Well, in this example the group (ar) is unmatched, so sre throws the
error, and because of the alternation, the workaround you mentioned
doesn't seem to directly apply.

A better example is probably
re.sub("foo(?:b(ar)|foo)","\\1","foofoo")
because this can't be simply repaired by refactoring the regex.

The correct behaviour, as I have observed in other regex
implementations, is to replace the group by the empty string; for
example, in Javascript:
>>> 'foobar'.replace(/foo(?:b(ar)|baz)/,'$1')
"ar"
>>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')
""
msg81064 - (view) Author: Gerard (gerardjp) Date: 2009-02-03 15:59
Bobby,

Can you post the actual text you need this for? The back ref indeed
returns a None. I'm wondering if the regex can be be simplefied and if a
positive lookbehind could solve this.

Symantically speaking ... If there's a "b" then return the "ar", because
then an empty alternate might again be of help.

Kind regards,

Gerard.
msg81118 - (view) Author: Robert Xiao (nneonneo) Date: 2009-02-04 00:36
It was so long ago, I've since redone half my codebase (the hack is
still there, but I can't remember what it was meant to replace now :( ).

Sorry about that.
msg81220 - (view) Author: Matthew Barnett (mrabarnett) Date: 2009-02-05 19:32
This has been addressed in issue #2636.
msg81462 - (view) Author: Gerard (gerardjp) Date: 2009-02-09 16:44
Matthew,

Thanx for the heads-up!

Regards,

Gerard.
msg108662 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-26 00:30
If I understand "This has been addressed in issue #2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct?
msg108669 - (view) Author: Matthew Barnett (mrabarnett) Date: 2010-06-26 00:58
Issue #2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module.
msg108670 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-06-26 01:09
It would be nice if you could port 'pieces' of #2636 to Python, in order to fix this and other bugs (and possibly add more features too).
History
Date User Action Args
2010-06-26 01:09:57ezio.melottisetnosy: + ezio.melotti
messages: + msg108670
2010-06-26 00:58:24mrabarnettsetmessages: + msg108669
2010-06-26 00:30:53terry.reedysetnosy: + terry.reedy

messages: + msg108662
versions: - Python 2.5, Python 3.0
2009-02-09 16:44:49gerardjpsetmessages: + msg81462
2009-02-05 19:32:55mrabarnettsetnosy: + mrabarnett
messages: + msg81220
2009-02-04 00:36:38nneonneosetmessages: + msg81118
2009-02-03 15:59:47gerardjpsetmessages: + msg81064
2009-01-14 14:34:02nneonneosetmessages: + msg79853
versions: + Python 2.6, Python 2.5, Python 3.0
2009-01-14 05:21:40gerardjpsetmessages: + msg79830
2008-12-24 21:30:42nneonneosetmessages: + msg78272
2008-09-27 14:39:08timehorsesetversions: + Python 2.7, - Python 2.5
2008-09-27 14:36:36timehorsesetnosy: + timehorse
2008-07-11 16:52:19BMinternsetmessages: + msg69558
2008-07-11 08:17:20gerardjpsetnosy: + gerardjp
messages: + msg69541
title: Unmatched Group issue -> Unmatched Group issue - workaround
2007-12-16 12:24:50BMinternsetnosy: + BMintern
messages: + msg58672
2006-07-09 18:34:12nneonneocreate