Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRE bugs with capturing groups in negative assertions #38345

Closed
glchapman mannequin opened this issue Apr 21, 2003 · 7 comments
Closed

SRE bugs with capturing groups in negative assertions #38345

glchapman mannequin opened this issue Apr 21, 2003 · 7 comments
Labels
build The build process and cross-build topic-regex

Comments

@glchapman
Copy link
Mannequin

glchapman mannequin commented Apr 21, 2003

BPO 725149
Nosy @birkenfeld
Files
  • neg_groups_patch.txt
  • assert_patch.txt: new patch replacing previous with added fix for pos. assertions
  • ftp_eme.py: Transfering Files via FTP
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-02-07.20:50:26.991>
    created_at = <Date 2003-04-21.18:22:00.000>
    labels = ['expert-regex', 'build']
    title = 'SRE bugs with capturing groups in negative assertions'
    updated_at = <Date 2009-02-23.21:48:45.776>
    user = 'https://bugs.python.org/glchapman'

    bugs.python.org fields:

    activity = <Date 2009-02-23.21:48:45.776>
    actor = 'georg.brandl'
    assignee = 'niemeyer'
    closed = True
    closed_date = <Date 2008-02-07.20:50:26.991>
    closer = 'draghuram'
    components = ['Regular Expressions']
    creation = <Date 2003-04-21.18:22:00.000>
    creator = 'glchapman'
    dependencies = []
    files = ['852', '853', '13158']
    hgrepos = []
    issue_num = 725149
    keywords = []
    message_count = 7.0
    messages = ['15565', '15566', '15567', '15568', '62179', '82636', '82644']
    nosy_count = 5.0
    nosy_names = ['georg.brandl', 'glchapman', 'niemeyer', 'draghuram', 'daniel_py']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'compile error'
    url = 'https://bugs.python.org/issue725149'
    versions = ['Python 2.6']

    @glchapman
    Copy link
    Mannequin Author

    glchapman mannequin commented Apr 21, 2003

    SRE is broken in some subtle ways when you combine
    capturing groups with assertions. For example:

    >>> re.match('((?!(a)c)[ab])*', 'abc').groups()
    ('b', '')

    In the above '(a)' has matched an empty string. Or
    worse:

    >>> re.match('(a)((?!(b)*))*', 'abb').groups()
    ('b', None, None)

    Here '(a)' matches 'b'.

    Although Perl reports matches for groups in negative
    assertions, I think it is better to adopt the PCRE rule
    that these groups are always reported as unmatched
    outside the assertion (inside the assertion, if used with
    backreferences, they should behave as normal). This
    would make the handling of subpatterns in negative
    assertions consistent with that of subpatterns in
    branches:

    >>> re.match('(a)c|ab', 'ab').groups()
    (None,)

    In the above, although '(a)' matches before the branch
    fails, the failure of the branch means '(a)' is considered
    not to have matched.

    Anyway, the attached patch is an effort to fix this
    problem by saving the values of marks before calling the
    assertion, and then restoring them afterwards (thus
    undoing whatever might have been done in the assertion).

    @glchapman glchapman mannequin added the topic-regex label Apr 21, 2003
    @glchapman glchapman mannequin assigned niemeyer Apr 21, 2003
    @glchapman glchapman mannequin added the topic-regex label Apr 21, 2003
    @glchapman glchapman mannequin assigned niemeyer Apr 21, 2003
    @glchapman
    Copy link
    Mannequin Author

    glchapman mannequin commented Apr 22, 2003

    Logged In: YES
    user_id=86307

    In thinking further, I realized that positive assertions are also
    affected by the second problem. E.g.:

    >>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
    ('b', None)

    The problem here is that a successful match in an assertion
    can leave marks at the top of the mark stack which then get
    popped in the wrong place. Attaching a new patch which
    should catch this problem for both kinds of assertions (and
    which also should "unmark" groups in negative assertions).

    @niemeyer
    Copy link
    Mannequin

    niemeyer mannequin commented Apr 27, 2003

    Logged In: YES
    user_id=7887

    Greg, I think there are two different issues here.

    One of them is related to a wrong behavior from
    mark_save/restore(), which don't restore the stackbase
    before restoring the marks. Asserts were afected because
    they're the only recursive ops that can continue the loop,
    but the problem would happen to any operation with the same
    behavior. So, rather than hardcoding this into asserts, I
    have changed mark_save/restore() to always restore the
    stackbase before restoring the marks. This should fix these
    two cases you presented:

    >> re.match('(a)(?:(?=(b))c)', 'abb').groups()
    >> re.match('(a)((?!(b)))', 'abb').groups()

    And was applied as:

    Modules/_sre.c: 2.95
    Lib/test/test_re.py: 1.41

    The other issue is related to the asserts which are leaving
    half-marked groups. While your solution does work, it
    changes the current behavior, which is also compatible to
    how perl works. I understand that individual groups matching
    when the whole string doesn't match is atypical. OTOH, a
    negative assertion *is* atypical, and IMO denying external
    group access won't help the user to understand how it works.
    In other words, I think it's better to have incomplete
    support for the moment, than having none.
    This way, we can think further about this, and look for an
    elegant solution to fix that support, certainly including
    some algorithm to check for half-marked groups.

    Thank you very much for spotting these bugs, and submitting
    a solution for them.

    @glchapman
    Copy link
    Mannequin Author

    glchapman mannequin commented Apr 28, 2003

    Logged In: YES
    user_id=86307

    Gustavo,

    Just a quick note on compatibility. As I mentioned, the PCRE
    rule is that groups in negative asserts do not match. So, with
    Python 2.2, you get this result using pre:

    >>> pre.match('((?!(a)c)[ab])*', 'abc').groups()
    ('b', None)

    Although I understand your argument, I'll just say that I
    personally think it would be better to move toward compatibility
    with pre (which after all was the official Python regex engine
    before sre), rather than to remain partly compatible with Perl.

    Thanks for reviewing my patches and fixing the bugs!

    @draghuram
    Copy link
    Mannequin

    draghuram mannequin commented Feb 7, 2008

    looks to have been fixed.

    @draghuram draghuram mannequin closed this as completed Feb 7, 2008
    @draghuram draghuram mannequin closed this as completed Feb 7, 2008
    @danielpy
    Copy link
    Mannequin

    danielpy mannequin commented Feb 23, 2009

    I need help to resolve this problem caused when program a scheduled task
    whit CRONTAB, generated result:

    Traceback (most recent call last):
      File "ftp_eme.py", line 12, in ?
        import datetime, os, ftplib
      File "/usr/local/lib/python2.6/ftplib.py", line 46, in ?
        import socket
      File "/usr/local/lib/python2.6/socket.py", line 50, in ?
        import _ssl
      File "/usr/local/lib/python2.6/_ssl.py", line 58, in ?
        import textwrap
      File "/usr/local/lib/python2.6/textwrap.py", line 10, in ?
        import string, re
      File "/usr/local/lib/python2.6/string.py", line 81, in ?
        import re as _re
      File "/usr/local/lib/python2.6/re.py", line 105, in ?
        import sre_compile
      File "/usr/local/lib/python2.6/sre_compile.py", line 17, in ?
        assert _sre.MAGIC == MAGIC, "SRE module mismatch"
    AssertionError: SRE module mismatch
    No message, no subject; hope that's ok

    @danielpy danielpy mannequin added OS-windows and removed topic-regex labels Feb 23, 2009
    @danielpy danielpy mannequin changed the title SRE bugs with capturing groups in negative assertions _sre.MAGIC Feb 23, 2009
    @danielpy danielpy mannequin added build The build process and cross-build OS-windows and removed topic-regex labels Feb 23, 2009
    @danielpy danielpy mannequin changed the title SRE bugs with capturing groups in negative assertions _sre.MAGIC Feb 23, 2009
    @danielpy danielpy mannequin added the build The build process and cross-build label Feb 23, 2009
    @birkenfeld
    Copy link
    Member

    Please do not hijack existing issues. In the case of this problem, do
    not open an issue at all, but ask in a Python mailing list or newsgroup.

    @birkenfeld birkenfeld changed the title _sre.MAGIC SRE bugs with capturing groups in negative assertions Feb 23, 2009
    @birkenfeld birkenfeld changed the title _sre.MAGIC SRE bugs with capturing groups in negative assertions Feb 23, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build topic-regex
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant