classification
Title: re.sub() different behavior in 3.7
Type: Stage: resolved
Components: Regular Expressions Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Stephan Bergmann, ezio.melotti, mrabarnett, purificant, serhiy.storchaka, xtreak
Priority: normal Keywords:

Created on 2018-10-14 13:32 by purificant, last changed 2018-11-07 15:22 by serhiy.storchaka. This issue is now closed.

Messages (6)
msg327707 - (view) Author: purificant (purificant) Date: 2018-10-14 13:32
A call to re.sub() returns different results in Python 3.7 compared to versions 3.6 / 3.5 and 2.7

Example behavior in 2.7 / 3.5 and 3.6:
>>> re.sub(r'(([^/]*)(/.*)?)', r'\2.zip/\1/', 'example')
'example.zip/example/'

Example in 3.7.0 and 3.7.1rc2:
>>> re.sub(r'(([^/]*)(/.*)?)', r'\2.zip/\1/', 'example')
'example.zip/example/.zip//'

As you can see the returned string is different for the same regex. re.subn() confirms that 2 replacements are made instead of 1.

Is it intended to have different behaviour in 3.7+ or is this a bug?
Thanks
msg327710 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-14 13:58
Thanks for the report. git bisect tells me this change was introduced with fbb490fd2f38bd817d99c20c05121ad0168a38ee (issue32308)

# ../backups/bpo34982.py
import re

print(re.sub(r'(([^/]*)(/.*)?)', r'\2.zip/\1/', 'example'))

# Running script at fbb490fd2f38bd817d99c20c05121ad0168a38ee

➜  cpython git:(fbb490fd2f) ./python.exe ../backups/bpo34982.py
example.zip/example/.zip//

# Script at fbb490fd2f38bd817d99c20c05121ad0168a38ee~1

➜  cpython git:(fbb490fd2f) git checkout -q fbb490fd2f38bd817d99c20c05121ad0168a38ee~1
➜  cpython git:(0cc99c8cd7) make > /dev/null
➜  cpython git:(0cc99c8cd7) ./python.exe ../backups/bpo34982.py
example.zip/example/

I think is an intended change as noted in the message that might break third party code (msg308229) . Adding Serhiy for thoughts.
msg327711 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-10-14 14:08
Yes, this is an intended change. Your pattern matches an empty string at the end of the input string. It was a bug in earlier Python versions that re.sub() didn't replace empty matches adjacent to a previous non-empty match.

It is not clear what is the purpose of your code, but adding anchors or replacing * with + usually helps.
msg327712 - (view) Author: purificant (purificant) Date: 2018-10-14 14:28
Great, thank you for explaining. My specific use case can be fixed by replacing * with + as per your suggestion.
msg329418 - (view) Author: Stephan Bergmann (Stephan Bergmann) Date: 2018-11-07 14:55
So, just to make sure, that also means that

  re.sub('a*$', 'b', 'a')

returning 'bb' instead of 'b' is intended behavior?
msg329420 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-07 15:22
Yes, it is.
History
Date User Action Args
2018-11-07 15:22:57serhiy.storchakasetmessages: + msg329420
2018-11-07 14:55:15Stephan Bergmannsetnosy: + Stephan Bergmann
messages: + msg329418
2018-10-14 14:28:47purificantsetmessages: + msg327712
2018-10-14 14:08:48serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg327711

stage: resolved
2018-10-14 13:58:42xtreaksetnosy: + serhiy.storchaka
2018-10-14 13:58:27xtreaksetmessages: + msg327710
2018-10-14 13:40:26xtreaksetnosy: + xtreak
2018-10-14 13:32:52purificantcreate