Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.finditer hangs on final empty match #39362

Closed
kevinbutler mannequin opened this issue Oct 3, 2003 · 4 comments
Closed

re.finditer hangs on final empty match #39362

kevinbutler mannequin opened this issue Oct 3, 2003 · 4 comments

Comments

@kevinbutler
Copy link
Mannequin

kevinbutler mannequin commented Oct 3, 2003

BPO 817234
Files
  • sre.patch: Applied patch.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2004-09-03.18:13:42.000>
    created_at = <Date 2003-10-03.15:01:52.000>
    labels = ['expert-regex']
    title = 're.finditer hangs on final empty match'
    updated_at = <Date 2004-09-03.18:13:42.000>
    user = 'https://bugs.python.org/kevinbutler'

    bugs.python.org fields:

    activity = <Date 2004-09-03.18:13:42.000>
    actor = 'niemeyer'
    assignee = 'niemeyer'
    closed = True
    closed_date = None
    closer = None
    components = ['Regular Expressions']
    creation = <Date 2003-10-03.15:01:52.000>
    creator = 'kevinbutler'
    dependencies = []
    files = ['1066']
    hgrepos = []
    issue_num = 817234
    keywords = []
    message_count = 4.0
    messages = ['18533', '18534', '18535', '18536']
    nosy_count = 3.0
    nosy_names = ['effbot', 'niemeyer', 'kevinbutler']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue817234'
    versions = ['Python 2.3']

    @kevinbutler
    Copy link
    Mannequin Author

    kevinbutler mannequin commented Oct 3, 2003

    The iterator returned by re.finditer appears to not
    terminate if the
    final match is empty, but rather keeps returning the
    final (empty) match.

    Is this a bug in _sre? If so, I'll be happy to file
    it, though fixing
    it is a bit beyond my _sre experience level at this
    point. The solution
    would appear to be to either a check for duplicate
    match in
    iterator.next(), or to increment position by one after
    returning an
    empty match (which should be OK, because if a non-empty
    match started at
    that location, we would have returned it instead of the
    empty match).

    Code to illustrate the failure:

    from re import finditer
    
    last = None
    for m in finditer( ".*", "asdf" ):
        if last == m.span():
            print "duplicate match:", last
            break
        print m.group(), m.span()
        last = m.span()
       

    asdf (0, 4)
    (4, 4)
    duplicate match: (4, 4)
    ---

    findall works:

    print re.findall( ".*", "asdf" )
    ['asdf', '']

    Workaround is to explicitly check for a duplicate span,
    as I did above,
    or to check for a duplicate end(), which avoids the
    final empty match

    Seo Sanghyeon sent the following fix to python-dev list:

    Attached one line patch fixes re.finditer bug reported by
    Kevin J. Butler. I read cvs log to find out why this
    code is
    introduced, and it seems to be related to SF bug bpo-581080.

    But that bug didn't appear after my patch, so I wonder
    why it was introduced in the first place. It seems beyond
    my understanding. Please enlighten me.

    To test:

    python/cpython#36890
    import re
    list(re.finditer('\s', 'a b'))
    # expected: one item list
    # bug: hang
    
    #Kevin J. Butler
    import re
    list(re.finditer('.*', 'asdf'))
    # expected: two item list (?)
    # bug: hang

    Seo Sanghyeon
    -------------- next part --------------
    ? patch
    Index: Modules/_sre.c
    ===================================================================
    RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v
    retrieving revision 2.99
    diff -c -r2.99 _sre.c
    *** Modules/_sre.c 26 Jun 2003 14:41:08 -0000 2.99
    --- Modules/_sre.c 2 Oct 2003 03:48:55 -0000


    *** 3062,3069 ****
    match = pattern_new_match((PatternObject*)
    self->pattern,
    state, status);

    ! if ((status == 0 || state->ptr == state->start) &&
    ! state->ptr < state->end)
    state->start = (void*) ((char*) state->ptr +
    state->charsize);
    else
    state->start = state->ptr;
    --- 3062,3068 ----

          match = pattern_new_match((PatternObject*)
    self->pattern,
                                     state, status);
      
    !     if (status == 0 || state->ptr == state->start)
              state->start = (void*) ((char*) state->ptr +
    state->charsize);
          else
              state->start = state->ptr;

    @kevinbutler kevinbutler mannequin closed this as completed Oct 3, 2003
    @kevinbutler kevinbutler mannequin assigned niemeyer Oct 3, 2003
    @kevinbutler kevinbutler mannequin added the topic-regex label Oct 3, 2003
    @kevinbutler
    Copy link
    Mannequin Author

    kevinbutler mannequin commented Oct 3, 2003

    Logged In: YES
    user_id=117665

    The above patch does resolve the problem.

    The code was introduced in rev 2.85
    http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_sre.c
    to resolve bug 581080
    http://sourceforge.net/tracker/index.php?func=detail&aid=581080&group_id=5470&atid=105470
    but removing this line does not re-introduce that bug.

    Thanks, and kudos to Seo...

    @effbot
    Copy link
    Mannequin

    effbot mannequin commented Sep 3, 2004

    Logged In: YES
    user_id=38376

    Still there in 2.4a3, as the following revised example shows:

    import re
    
    m = re.finditer(".*", "asdf")

    print m.next().span()
    print m.next().span()
    print m.next().span() # this should raise an exception

    Gustavo, can you look at this patch too?

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Sep 3, 2004

    Logged In: YES
    user_id=7887

    Patch applied and test cases added to check this bug and also for
    bpo-581080.

    Kevin and Seo, thanks for the bug report and the fix.

    Fredrik, thanks for pointing me to the issue.

    Applied as:

    Lib/test/test_re.py: 1.52
    Modules/_sre.c: 2.108

    Patch attached for reference.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants