Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmatched Group issue - workaround #43640

Closed
nneonneo mannequin opened this issue Jul 9, 2006 · 23 comments
Closed

Unmatched Group issue - workaround #43640

nneonneo mannequin opened this issue Jul 9, 2006 · 23 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@nneonneo
Copy link
Mannequin

nneonneo mannequin commented Jul 9, 2006

BPO 1519638
Nosy @terryjreedy, @ezio-melotti, @serhiy-storchaka
Files
  • re_sub_unmatched_group.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-10-10.08:45:02.200>
    created_at = <Date 2006-07-09.18:34:12.000>
    labels = ['expert-regex', 'type-feature', 'library']
    title = 'Unmatched Group issue - workaround'
    updated_at = <Date 2014-10-10.08:45:02.198>
    user = 'https://bugs.python.org/nneonneo'

    bugs.python.org fields:

    activity = <Date 2014-10-10.08:45:02.198>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-10-10.08:45:02.200>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2006-07-09.18:34:12.000>
    creator = 'nneonneo'
    dependencies = []
    files = ['36650']
    hgrepos = []
    issue_num = 1519638
    keywords = ['patch']
    message_count = 23.0
    messages = ['29112', '29113', '29114', '58672', '69541', '69558', '78272', '79830', '79853', '81064', '81118', '81220', '81462', '108662', '108669', '108670', '155967', '155969', '155982', '155983', '227037', '228966', '228969']
    nosy_count = 13.0
    nosy_names = ['effbot', 'terry.reedy', 'mchaput', 'nneonneo', 'timehorse', 'BMintern', 'ezio.melotti', 'mrabarnett', 'gerardjp', 'THRlWiTi', 'python-dev', 'serhiy.storchaka', 'Nikker']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1519638'
    versions = ['Python 3.5']

    @nneonneo
    Copy link
    Mannequin Author

    nneonneo mannequin commented Jul 9, 2006

    Using sre.sub[n], an "unmatched group" error can occur.

    The test I used is this pattern:

    sre.sub("foo(?:b(ar)|baz)","\\1","foobaz")

    This will cause the following backtrace to occur:

    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "lib/python2.4/sre.py", line 142, in sub
        return _compile(pattern, 0).sub(repl, string, count)
      File "lib/python2.4/sre.py", line 260, in filter
        return sre_parse.expand_template(template, match)
      File "lib/python2.4/sre_parse.py", line 782, in expand_template
        raise error, "unmatched group"
    sre_constants.error: unmatched group

    Python Version 2.4.3, Mac OS X (behaviour has been verified on
    Windows 2.4.3 as well).

    This behaviour, while by design, is unwanted because this type of
    matching usually requests that a blank match be returned (i.e. the
    example should return '')

    The example that I was trying resembles the following:

    sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data)

    The intended behaviour is that the function returns "" when the user is
    a guest and the user number if the user is a registered member.

    However, when this function encounters a Guest, it raises an exception
    and terminates, which is not what is wanted.

    Perl and other regex engines behave as I have described, substituting
    empty strings for unmatched groups. The code fix is relatively simple,
    and would really help out for these types of things.

    @nneonneo nneonneo mannequin assigned effbot Jul 9, 2006
    @nneonneo nneonneo mannequin added the topic-regex label Jul 9, 2006
    @mchaput
    Copy link
    Mannequin

    mchaput mannequin commented Feb 15, 2007

    The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception.

    @nneonneo
    Copy link
    Mannequin Author

    nneonneo mannequin commented Feb 17, 2007

    AFAIK the findall function works as desired in this respect: empty matches will return empty strings.

    @bmintern
    Copy link
    Mannequin

    bmintern mannequin commented Dec 16, 2007

    This is still a problem which has just given me a headache, because
    using re.sub now requires gymnastics instead of just using a simple
    string as I did in Perl.

    @gerardjp
    Copy link
    Mannequin

    gerardjp mannequin commented Jul 11, 2008

    Hi All,

    I found a workaround for the re.sub method so it does not raise an
    exception but returns and empty string when backref-ing an empty group.

    This is the nutshell:

    When doing a search and replace with sub, replace the group represented
    as optional for a group represented as an alternation with one empty
    subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without
    the double quotes).

    If there’s nothing matched by this group the empty subexpression
    matches. Then an empty string is returned instead of a None and the sub
    method is executed normally instead of raising the “unmatched group” error.

    A complete description is in my post:
    http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/

    Regards,

    Gerard.

    @gerardjp gerardjp mannequin changed the title Unmatched Group issue Unmatched Group issue - workaround Jul 11, 2008
    @bmintern
    Copy link
    Mannequin

    bmintern mannequin commented Jul 11, 2008

    Looking at your code example, that solution seems quite obvious now, and
    I wouldn't even call it a "workaround". Thanks for figuring this out.
    Now if I could only remember what code I was using that for...

    @nneonneo
    Copy link
    Mannequin Author

    nneonneo mannequin commented Dec 24, 2008

    How would I apply that workaround to my example?

    re.sub("foo(?:b(ar)|baz)","\\1","foobaz")

    @gerardjp
    Copy link
    Mannequin

    gerardjp mannequin commented Jan 14, 2009

    Dear Bobby,

    I don't see what would be the part that generates the empty string?

    Regards,

    Gerard.

    @nneonneo
    Copy link
    Mannequin Author

    nneonneo mannequin commented Jan 14, 2009

    Well, in this example the group (ar) is unmatched, so sre throws the
    error, and because of the alternation, the workaround you mentioned
    doesn't seem to directly apply.

    A better example is probably
    re.sub("foo(?:b(ar)|foo)","\\1","foofoo")
    because this can't be simply repaired by refactoring the regex.

    The correct behaviour, as I have observed in other regex
    implementations, is to replace the group by the empty string; for
    example, in Javascript:
    >>> 'foobar'.replace(/foo(?:b(ar)|baz)/,'$1')
    "ar"
    >>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')
    ""

    @gerardjp
    Copy link
    Mannequin

    gerardjp mannequin commented Feb 3, 2009

    Bobby,

    Can you post the actual text you need this for? The back ref indeed
    returns a None. I'm wondering if the regex can be be simplefied and if a
    positive lookbehind could solve this.

    Symantically speaking ... If there's a "b" then return the "ar", because
    then an empty alternate might again be of help.

    Kind regards,

    Gerard.

    @nneonneo
    Copy link
    Mannequin Author

    nneonneo mannequin commented Feb 4, 2009

    It was so long ago, I've since redone half my codebase (the hack is
    still there, but I can't remember what it was meant to replace now :( ).

    Sorry about that.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Feb 5, 2009

    This has been addressed in issue bpo-2636.

    @gerardjp
    Copy link
    Mannequin

    gerardjp mannequin commented Feb 9, 2009

    Matthew,

    Thanx for the heads-up!

    Regards,

    Gerard.

    @terryjreedy
    Copy link
    Member

    If I understand "This has been addressed in issue bpo-2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct?

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Jun 26, 2010

    Issue bpo-2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module.

    @ezio-melotti
    Copy link
    Member

    It would be nice if you could port 'pieces' of bpo-2636 to Python, in order to fix this and other bugs (and possibly add more features too).

    @Nikker
    Copy link
    Mannequin

    Nikker mannequin commented Mar 15, 2012

    I'm having the same issue as the original author of this issue was. The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional.

    I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it. My regular expression looks like so:

    These work:
    >>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)')
    'avatar 2009'
    >>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009) (special edition)')
    'avatar 2009'
    
    This doesn't:
    >>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.6/re.py", line 151, in sub
        return _compile(pattern, 0).sub(repl, string, count)
      File "/usr/lib/python2.6/re.py", line 278, in filter
        return sre_parse.expand_template(template, match)
      File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template
        raise error, "unmatched group"
    sre_constants.error: unmatched groupedition)')

    Is there some way I can apply this workaround to this situation?

    @Nikker
    Copy link
    Mannequin

    Nikker mannequin commented Mar 15, 2012

    Sorry, the non-working command should look as follows:

    re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)')

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Mar 16, 2012

    The replacement can be a callable, so you could do this:

    re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)')

    @Nikker
    Copy link
    Mannequin

    Nikker mannequin commented Mar 16, 2012

    Perfect; thank you!

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 18, 2014
    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 10, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 10, 2014

    New changeset bd2f1ea04025 by Serhiy Storchaka in branch 'default':
    bpo-1519638: Now unmatched groups are replaced with empty strings in re.sub()
    https://hg.python.org/cpython/rev/bd2f1ea04025

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your review Antoine.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants