This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.sub treats * incorrectly?
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.10, Python 3.8, Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder: Replace empty matches adjacent to a previous non-empty match in re.sub()
View: 32308
Assigned To: Nosy List: Yujiri, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2020-06-22 17:28 by Yujiri, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg372104 - (view) Author: Ryan Westlund (Yujiri) Date: 2020-06-22 17:28
```
>>> re.sub('a*', '-', 'a')
'--'
>>> re.sub('a*', '-', 'aa')
'--'
>>> re.sub('a*', '-', 'aaa')
'--'
```

Shouldn't it be returning one dash, not two, since the greedy quantifier will match all the a's? I understand why substituting on 'b' returns '-a-', but shouldn't this constitute only one match? In Python 2.7, it behaves as I expect:

```
>>> re.sub('a*', '-', 'a')
'-'
>>> re.sub('a*', '-', 'aa')
'-'
>>> re.sub('a*', '-', 'aaa')
'-'
```

The original case that led me to this was trying to normalize a path to end in one slash. I used `re.sub('/*$', '/', path)`, but a nonzero number of slashes came out as two.
msg372105 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2020-06-22 17:54
This behavior was changed in 3.7: "Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd') returns '-a-b--d-'." [0]

See also bpo-32308 and bpo-25054.


[0]: https://docs.python.org/3/library/re.html#re.sub
msg372106 - (view) Author: Ryan Westlund (Yujiri) Date: 2020-06-22 18:04
Sorry, I forgot the pydoc docs don't have as much information as the online
docs.

On Mon, Jun 22, 2020 at 1:54 PM Ezio Melotti <report@bugs.python.org> wrote:

>
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
>
> This behavior was changed in 3.7: "Empty matches for the pattern are
> replaced only when not adjacent to a previous empty match, so sub('x*',
> '-', 'abxd') returns '-a-b--d-'." [0]
>
> See also bpo-32308 and bpo-25054.
>
>
> [0]: https://docs.python.org/3/library/re.html#re.sub
>
> ----------
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
> superseder:  -> Replace empty matches adjacent to a previous non-empty
> match in re.sub()
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41080>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85252
2020-06-22 18:04:10Yujirisetmessages: + msg372106
2020-06-22 17:54:14ezio.melottisetstatus: open -> closed
superseder: Replace empty matches adjacent to a previous non-empty match in re.sub()
messages: + msg372105

resolution: not a bug
stage: resolved
2020-06-22 17:28:11Yujiricreate