fullmatch isn't matching correctly under re.IGNORECASE #65197

Lucretiel · 2014-03-20T18:40:40Z

BPO	20998
Nosy	@ezio-melotti, @serhiy-storchaka
Files	sre_fullmatch_repeated_ignorecase.patch issue20998.patch issue20998_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2014-05-14.18:57:45.640>
created_at = <Date 2014-03-20.18:40:40.406>
labels = ['expert-regex', 'type-bug']
title = "fullmatch isn't matching correctly under re.IGNORECASE"
updated_at = <Date 2014-05-14.18:57:45.639>
user = 'https://bugs.python.org/Lucretiel'

bugs.python.org fields:

activity = <Date 2014-05-14.18:57:45.639>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2014-05-14.18:57:45.640>
closer = 'serhiy.storchaka'
components = ['Regular Expressions']
creation = <Date 2014-03-20.18:40:40.406>
creator = 'Lucretiel'
dependencies = []
files = ['34537', '34538', '34799']
hgrepos = []
issue_num = 20998
keywords = ['patch']
message_count = 10.0
messages = ['214257', '214272', '214287', '215546', '215549', '215667', '216019', '216022', '218566', '218567']
nosy_count = 6.0
nosy_names = ['ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'Lucretiel', 'Gareth.Gouldstone']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20998'
versions = ['Python 3.4', 'Python 3.5']

Lucretiel · 2014-03-20T18:40:40Z

I have the following regular expression:

In [2]: regex = re.compile("ME IS \w+", re.I)

For some reason, when using fullmatch, it doesn't match substrings longer than 1 for the '\w+':

In [3]: regex.fullmatch("ME IS L")
Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [4]: regex.fullmatch("me is l")
Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>

In [5]: regex.fullmatch("ME IS Lucretiel")

In [6]: regex.fullmatch("me is lucretiel")

I have no idea why this is happening. Using match works fine:

In [7]: regex.match("ME IS L")
Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [8]: regex.match("ME IS Lucretiel")
Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

In [9]: regex.match("me is lucretiel")
Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>

Additionally, using fullmatch WITHOUT using the re.I flag causes it to work:

In [10]: regex = re.compile("ME IS \w+")

In [11]: regex.fullmatch("ME IS L")
Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>

In [12]: regex.fullmatch("ME IS Lucretiel")
Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>

My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).

serhiy-storchaka · 2014-03-20T20:26:25Z

Here is a patch.

mrabarnett · 2014-03-20T21:37:52Z

FWIW, here's my own attempt at a patch.

serhiy-storchaka · 2014-04-04T18:22:59Z

Both patch are almost equivalent (my patch is much simpler but perhaps
Matthew's approach is more correct in long perspective).

Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my
comments here.

           (!ctx-\>match_all || ctx-\>ptr == state-\>end)) {

```
           ctx-\>ptr == state-\>end) {
```

Why this check is not needed anymore?

               status = SRE(match)(state, pattern + 2\*prefix_skip);

               status = SRE(match)(state, pattern + 2\*prefix_skip,

state->match_all);

       status = SRE(match)(state, pattern + 2);

       status = SRE(match)(state, pattern + 2, state-\>match_all);

state->match_all is used but it is never initialized.

mrabarnett · 2014-04-04T18:49:34Z

> - (!ctx->match_all || ctx->ptr == state->end)) {
> + ctx->ptr == state->end) {

Why this check is not needed anymore?

After stepping through the code for that regex that fails, I concluded
that the condition shouldn't depend on ctx->match_all at that point
after all.

> - status = SRE(match)(state, pattern + 2*prefix_skip);
> + status = SRE(match)(state, pattern + 2*prefix_skip,
state->match_all);

> - status = SRE(match)(state, pattern + 2);
> + status = SRE(match)(state, pattern + 2, state->match_all);

state->match_all is used but it is never initialized.

I thought I'd initialised it in all the places it's used.

I admit that I find the code a little hard to follow at times... :-(

GarethGouldstone · 2014-04-06T20:32:43Z

fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (bpo-21002). Is it possible to adapt this patch to fix this omission?

serhiy-storchaka · 2014-04-13T15:28:32Z

After stepping through the code for that regex that fails, I concluded
that the condition shouldn't depend on ctx->match_all at that point
after all.

Tests are passed without this check. But I'm not sure it is not needed. At
least without this check the code is not equivalent to the code before adding
support for fullmatch(). So I prefer to left it as is.

I thought I'd initialised it in all the places it's used.

I admit that I find the code a little hard to follow at times... :-(

Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it
will be more consistent to get rid of the match_all field in the SRE_STATE
structure and pass it as argument.

serhiy-storchaka · 2014-04-13T15:50:27Z

Gareth, this is unrelated issue.

python-dev · 2014-05-14T18:52:08Z

New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4':
Issue bpo-20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/6267428afbdb

New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default':
Issue bpo-20998: Fixed re.fullmatch() of repeated single character pattern
http://hg.python.org/cpython/rev/bcf64c1c92f6

serhiy-storchaka · 2014-05-14T18:57:46Z

Thank you Matthew for your contribution.

Lucretiel mannequin added topic-regex type-bug An unexpected behavior, bug, or error labels Mar 20, 2014

serhiy-storchaka self-assigned this Apr 13, 2014

serhiy-storchaka closed this as completed May 14, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fullmatch isn't matching correctly under re.IGNORECASE #65197

fullmatch isn't matching correctly under re.IGNORECASE #65197

Lucretiel mannequin commented Mar 20, 2014

Lucretiel mannequin commented Mar 20, 2014

serhiy-storchaka commented Mar 20, 2014

mrabarnett mannequin commented Mar 20, 2014

serhiy-storchaka commented Apr 4, 2014

mrabarnett mannequin commented Apr 4, 2014

GarethGouldstone mannequin commented Apr 6, 2014

serhiy-storchaka commented Apr 13, 2014

serhiy-storchaka commented Apr 13, 2014

python-dev mannequin commented May 14, 2014

serhiy-storchaka commented May 14, 2014

fullmatch isn't matching correctly under re.IGNORECASE #65197

fullmatch isn't matching correctly under re.IGNORECASE #65197

Comments

Lucretiel mannequin commented Mar 20, 2014

Lucretiel mannequin commented Mar 20, 2014

serhiy-storchaka commented Mar 20, 2014

mrabarnett mannequin commented Mar 20, 2014

serhiy-storchaka commented Apr 4, 2014

mrabarnett mannequin commented Apr 4, 2014

GarethGouldstone mannequin commented Apr 6, 2014

serhiy-storchaka commented Apr 13, 2014

serhiy-storchaka commented Apr 13, 2014

python-dev mannequin commented May 14, 2014

serhiy-storchaka commented May 14, 2014