Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fullmatch isn't matching correctly under re.IGNORECASE #65197

Closed
Lucretiel mannequin opened this issue Mar 20, 2014 · 10 comments
Closed

fullmatch isn't matching correctly under re.IGNORECASE #65197

Lucretiel mannequin opened this issue Mar 20, 2014 · 10 comments
Assignees
Labels
topic-regex type-bug An unexpected behavior, bug, or error

Comments

@Lucretiel
Copy link
Mannequin

Lucretiel mannequin commented Mar 20, 2014

BPO 20998
Nosy @ezio-melotti, @serhiy-storchaka
Files
  • sre_fullmatch_repeated_ignorecase.patch
  • issue20998.patch
  • issue20998_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-05-14.18:57:45.640>
    created_at = <Date 2014-03-20.18:40:40.406>
    labels = ['expert-regex', 'type-bug']
    title = "fullmatch isn't matching correctly under re.IGNORECASE"
    updated_at = <Date 2014-05-14.18:57:45.639>
    user = 'https://bugs.python.org/Lucretiel'

    bugs.python.org fields:

    activity = <Date 2014-05-14.18:57:45.639>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-05-14.18:57:45.640>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2014-03-20.18:40:40.406>
    creator = 'Lucretiel'
    dependencies = []
    files = ['34537', '34538', '34799']
    hgrepos = []
    issue_num = 20998
    keywords = ['patch']
    message_count = 10.0
    messages = ['214257', '214272', '214287', '215546', '215549', '215667', '216019', '216022', '218566', '218567']
    nosy_count = 6.0
    nosy_names = ['ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'Lucretiel', 'Gareth.Gouldstone']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20998'
    versions = ['Python 3.4', 'Python 3.5']

    @Lucretiel
    Copy link
    Mannequin Author

    Lucretiel mannequin commented Mar 20, 2014

    I have the following regular expression:

    In [2]: regex = re.compile("ME IS \w+", re.I)

    For some reason, when using fullmatch, it doesn't match substrings longer than 1 for the '\w+':

    In [3]: regex.fullmatch("ME IS L")
    Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

    In [4]: regex.fullmatch("me is l")
    Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>

    In [5]: regex.fullmatch("ME IS Lucretiel")

    In [6]: regex.fullmatch("me is lucretiel")

    I have no idea why this is happening. Using match works fine:

    In [7]: regex.match("ME IS L")
    Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

    In [8]: regex.match("ME IS Lucretiel")
    Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

    In [9]: regex.match("me is lucretiel")
    Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>

    Additionally, using fullmatch WITHOUT using the re.I flag causes it to work:

    In [10]: regex = re.compile("ME IS \w+")

    In [11]: regex.fullmatch("ME IS L")
    Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

    In [12]: regex.fullmatch("ME IS Lucretiel")
    Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

    My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).

    @Lucretiel Lucretiel mannequin added topic-regex type-bug An unexpected behavior, bug, or error labels Mar 20, 2014
    @serhiy-storchaka
    Copy link
    Member

    Here is a patch.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Mar 20, 2014

    FWIW, here's my own attempt at a patch.

    @serhiy-storchaka
    Copy link
    Member

    Both patch are almost equivalent (my patch is much simpler but perhaps
    Matthew's approach is more correct in long perspective).

    Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my
    comments here.

    •            (!ctx-\>match_all || ctx-\>ptr == state-\>end)) {
      
    •            ctx-\>ptr == state-\>end) {
      

    Why this check is not needed anymore?

    •                status = SRE(match)(state, pattern + 2\*prefix_skip);
      
    •                status = SRE(match)(state, pattern + 2\*prefix_skip, 
      

    state->match_all);

    •        status = SRE(match)(state, pattern + 2);
      
    •        status = SRE(match)(state, pattern + 2, state-\>match_all);
      

    state->match_all is used but it is never initialized.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Apr 4, 2014

    > - (!ctx->match_all || ctx->ptr == state->end)) {
    > + ctx->ptr == state->end) {

    Why this check is not needed anymore?

    After stepping through the code for that regex that fails, I concluded
    that the condition shouldn't depend on ctx->match_all at that point
    after all.

    > - status = SRE(match)(state, pattern + 2*prefix_skip);
    > + status = SRE(match)(state, pattern + 2*prefix_skip,
    state->match_all);

    > - status = SRE(match)(state, pattern + 2);
    > + status = SRE(match)(state, pattern + 2, state->match_all);

    state->match_all is used but it is never initialized.

    I thought I'd initialised it in all the places it's used.

    I admit that I find the code a little hard to follow at times... :-(

    @GarethGouldstone
    Copy link
    Mannequin

    GarethGouldstone mannequin commented Apr 6, 2014

    fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (bpo-21002). Is it possible to adapt this patch to fix this omission?

    @serhiy-storchaka
    Copy link
    Member

    After stepping through the code for that regex that fails, I concluded
    that the condition shouldn't depend on ctx->match_all at that point
    after all.

    Tests are passed without this check. But I'm not sure it is not needed. At
    least without this check the code is not equivalent to the code before adding
    support for fullmatch(). So I prefer to left it as is.

    I thought I'd initialised it in all the places it's used.

    I admit that I find the code a little hard to follow at times... :-(

    Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it
    will be more consistent to get rid of the match_all field in the SRE_STATE
    structure and pass it as argument.

    @serhiy-storchaka
    Copy link
    Member

    Gareth, this is unrelated issue.

    @serhiy-storchaka serhiy-storchaka self-assigned this Apr 13, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 14, 2014

    New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4':
    Issue bpo-20998: Fixed re.fullmatch() of repeated single character pattern
    http://hg.python.org/cpython/rev/6267428afbdb

    New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default':
    Issue bpo-20998: Fixed re.fullmatch() of repeated single character pattern
    http://hg.python.org/cpython/rev/bcf64c1c92f6

    @serhiy-storchaka
    Copy link
    Member

    Thank you Matthew for your contribution.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant