This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unexpected behavior re.sub() with raw f-strings
Type: behavior Stage: resolved
Components: Regular Expressions, Windows Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: dkreeft, eric.smith, ezio.melotti, mrabarnett, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-09-29 15:00 by dkreeft, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg377669 - (view) Author: dkreeft (dkreeft) Date: 2020-09-29 15:00
Steps to reproduce (Windows/Python 3.7.7):

1. Define replacement string that starts with an integer:

REPLACEMENT = '12345'

2. Use re.sub() as follows:

re.sub(r'([a-z]+)', fr"\1{REPLACEMENT}", 'something')

3. The outcome is not 'something12345' as expected, but 'J345'.

Note that I am using the group in the replacement argument, which is a raw f-string.

A quick investigation with other replacement strings renders similar unexpected behavior:

REPLACEMENT = '1': leads to re.error (invalid group reference 11 at position 1)
REPLACEMENT = '13': 'K'
etc.

So it seems like the f-string is evaluated first, yielding a string starting with an integer. Python then interprets the '\1' to indicate group 1 as '\1<first integer>', which leads to the behavior described above. Even if this is by design, it seems confusing and makes using groups with re.sub() cumbersome if the replacement f-string starts with an integer.
msg377670 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-29 15:19
f-strings are indeed evaluated when the value of the string is needed. Your example is equivalent to:

>>> re.sub(r'([a-z]+)', fr"\112345", 'something')
'J345'

As always with regexes, you need to be careful when dynamically composing them.
msg377671 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2020-09-29 15:35
Arguments are evaluated first and then the results are passed to the function. That's true throughout the language.

In this instance, you can use \g<1> in the replacement string to refer to group 1:

re.sub(r'([a-z]+)', fr"\g<1>{REPLACEMENT}", 'something')
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86051
2020-09-29 15:40:21eric.smithsetstatus: open -> closed
resolution: not a bug
stage: resolved
2020-09-29 15:35:02mrabarnettsetmessages: + msg377671
2020-09-29 15:19:39eric.smithsetnosy: + eric.smith
messages: + msg377670
2020-09-29 15:00:37dkreeftcreate