This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: regular expression regression in python 3.7
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: doughellmann, serhiy.storchaka, zzzeek
Priority: normal Keywords:

Created on 2018-03-05 15:40 by zzzeek, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg313251 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 15:40
demo:

import re

inner = 'VARCHAR(30) COLLATE "en_US"'

result = re.sub(
    r'((?: COLLATE.*)?)$',
    r'FOO\1',
    inner
)

print(inner)
print(result)


in all Python versions prior to 3.7:

    VARCHAR(30) COLLATE "en_US"
    VARCHAR(30)FOO COLLATE "en_US"

in Python 3.7.0b2:

    VARCHAR(30) COLLATE "en_US"
    VARCHAR(30)FOO COLLATE "en_US"FOO

platform: Fedora 27 
python build:
Python 3.7.0b2 (default, Mar  5 2018, 09:37:32) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
msg313252 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 15:42
correction, that's fedora 26, not 27
msg313255 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-05 16:09
This is intentional change.

Prior to 3.7 re.sub() didn't replace empty matches adjacent to a previous non-empty match. In 3.7 it does. Together with other changes this made all four functions that search multiple matches of the pattern (re.findall(), re.finditer(), re.split() and re.sub()) consistent.

In your example the pattern matches not only from " COLLATE" to the end of input string, but an empty string at the end of input string. If you do not want matching an empty string, just remove the '?' qualifier.
msg313256 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 16:10
can you point me to the documentation?
msg313257 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 16:17
also, removing the "?" is not an option for me.   I need the brackets to be placed prior to the "COLLATE" subsection, but unconditionally even if the "COLLATE" section is not present.     Looking at the change the behavior seems wrong to me.   The regexp states, "match the end of the string, plus an optional "COLLATE" clause, into a capturing expression.  replace everything here, e.g. the capturing part as well as the dollar sign part, with a single instance of FOO plus the captured part".   It is entirely unintuitive to me how a second replacement would be occurring here.   I cannot prove it but I think this change is wrong.

I will try to rewrite the expression.
msg313258 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-05 16:17
Just see the re.sub() documentation for 3.7. There is also a note in the What's New document, in the "Changes in the Python API" section.
msg313260 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-05 16:26
In your case you can just pass 1 as the fourth parameter of re.sub().
msg313262 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 16:29
for now the quickest solution is to add "count=1" so that it only replaces once.
msg313264 - (view) Author: mike bayer (zzzeek) * Date: 2018-03-05 16:32
for those watching this would be the findall() case which is consistent between pythons:

import re

for reg in [
    'VARCHAR(30) COLLATE "en_US"',
    'VARCHAR(30)'
]:

    print(re.findall(r'(?: COLLATE.*)?$', reg))


output (all pythons):

[' COLLATE "en_US"', '']
['']

so yes there are two matches for one and only one for the other.
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77179
2018-03-05 19:43:59doughellmannsetnosy: + doughellmann
2018-03-05 16:32:48zzzeeksetmessages: + msg313264
2018-03-05 16:29:06zzzeeksetmessages: + msg313262
2018-03-05 16:26:30serhiy.storchakasetmessages: + msg313260
2018-03-05 16:17:39serhiy.storchakasetmessages: + msg313258
2018-03-05 16:17:18zzzeeksetmessages: + msg313257
2018-03-05 16:10:54zzzeeksetmessages: + msg313256
2018-03-05 16:09:35serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg313255

resolution: not a bug
stage: resolved
2018-03-05 15:42:05zzzeeksetmessages: + msg313252
2018-03-05 15:40:37zzzeekcreate