classification
Title: fullmatch isn't matching correctly under re.IGNORECASE
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Gareth.Gouldstone, Lucretiel, ezio.melotti, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-03-20 18:40 by Lucretiel, last changed 2014-05-14 18:57 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
sre_fullmatch_repeated_ignorecase.patch serhiy.storchaka, 2014-03-20 20:26 review
issue20998.patch mrabarnett, 2014-03-20 21:37
issue20998_2.patch serhiy.storchaka, 2014-04-13 15:28 review
Messages (10)
msg214257 - (view) Author: Nathan West (Lucretiel) Date: 2014-03-20 18:40
I have the following regular expression:

In [2]: regex = re.compile("ME IS \w+", re.I)

For some reason, when using `fullmatch`, it doesn't match substrings longer than 1 for the '\w+':

In [3]: regex.fullmatch("ME IS L")
Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [4]: regex.fullmatch("me is l")
Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>

In [5]: regex.fullmatch("ME IS Lucretiel")

In [6]: regex.fullmatch("me is lucretiel")


I have no idea why this is happening. Using `match` works fine:

In [7]: regex.match("ME IS L")
Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [8]: regex.match("ME IS Lucretiel")
Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

In [9]: regex.match("me is lucretiel")
Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>

Additionally, using `fullmatch` WITHOUT using the `re.I` flag causes it to work:

In [10]: regex = re.compile("ME IS \w+")

In [11]: regex.fullmatch("ME IS L")
Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [12]: regex.fullmatch("ME IS Lucretiel")
Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).
msg214272 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-20 20:26
Here is a patch.
msg214287 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2014-03-20 21:37
FWIW, here's my own attempt at a patch.
msg215546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-04 18:22
Both patch are almost equivalent (my patch is much simpler but perhaps 
Matthew's approach is more correct in long perspective).

Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my 
comments here.

> -                (!ctx->match_all || ctx->ptr == state->end)) {
> +                ctx->ptr == state->end) {

Why this check is not needed anymore?

> -                    status = SRE(match)(state, pattern + 2*prefix_skip);
> +                    status = SRE(match)(state, pattern + 2*prefix_skip, 
state->match_all);

> -            status = SRE(match)(state, pattern + 2);
> +            status = SRE(match)(state, pattern + 2, state->match_all);

state->match_all is used but it is never initialized.
msg215549 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2014-04-04 18:49
> > -                (!ctx->match_all || ctx->ptr == state->end)) {
> > +                ctx->ptr == state->end) {
> 
> Why this check is not needed anymore?
> 
After stepping through the code for that regex that fails, I concluded 
that the condition shouldn't depend on ctx->match_all at that point 
after all.

> > -                    status = SRE(match)(state, pattern + 2*prefix_skip);
> > +                    status = SRE(match)(state, pattern + 2*prefix_skip, 
> state->match_all);
> 
> > -            status = SRE(match)(state, pattern + 2);
> > +            status = SRE(match)(state, pattern + 2, state->match_all);
> 
> state->match_all is used but it is never initialized.

I thought I'd initialised it in all the places it's used.

I admit that I find the code a little hard to follow at times... :-(
msg215667 - (view) Author: Gareth Gouldstone (Gareth.Gouldstone) Date: 2014-04-06 20:32
fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (issue 21002). Is it possible to adapt this patch to fix this omission?
msg216019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-13 15:28
> After stepping through the code for that regex that fails, I concluded
> that the condition shouldn't depend on ctx->match_all at that point
> after all.

Tests are passed without this check. But I'm not sure it is not needed. At 
least without this check the code is not equivalent to the code before adding 
support for fullmatch(). So I prefer to left it as is.

> I thought I'd initialised it in all the places it's used.
> 
> I admit that I find the code a little hard to follow at times... :-(

Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it 
will be more consistent to get rid of the match_all field in the SRE_STATE 
structure and pass it as argument.
msg216022 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-13 15:50
Gareth, this is unrelated issue.
msg218566 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-14 18:52
New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4':
Issue #20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/6267428afbdb

New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default':
Issue #20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/bcf64c1c92f6
msg218567 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-14 18:57
Thank you Matthew for your contribution.
History
Date User Action Args
2014-05-14 18:57:45serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg218567

stage: patch review -> resolved
2014-05-14 18:52:07python-devsetnosy: + python-dev
messages: + msg218566
2014-04-13 17:57:17serhiy.storchakasetassignee: serhiy.storchaka
2014-04-13 15:50:27serhiy.storchakasetmessages: + msg216022
2014-04-13 15:28:32serhiy.storchakasetfiles: + issue20998_2.patch

messages: + msg216019
2014-04-06 20:32:44Gareth.Gouldstonesetnosy: + Gareth.Gouldstone
messages: + msg215667
2014-04-04 18:49:34mrabarnettsetmessages: + msg215549
2014-04-04 18:22:59serhiy.storchakasetmessages: + msg215546
2014-03-20 21:37:52mrabarnettsetfiles: + issue20998.patch

messages: + msg214287
2014-03-20 20:26:25serhiy.storchakasetfiles: + sre_fullmatch_repeated_ignorecase.patch
keywords: + patch
messages: + msg214272

stage: needs patch -> patch review
2014-03-20 18:57:45serhiy.storchakasetnosy: + serhiy.storchaka
stage: needs patch

versions: + Python 3.5
2014-03-20 18:43:09Lucretielsettype: behavior
2014-03-20 18:40:40Lucretielcreate