This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.sub replaces twice
Type: behavior Stage: resolved
Components: Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: malin, spz1st
Priority: normal Keywords:

Created on 2020-08-14 22:28 by spz1st, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg375436 - (view) Author: S. Zhang (spz1st) Date: 2020-08-14 22:28
The following command produced "name.tsvtsv" with version 3.7.1 and 3.8.5 instead of the expected "name.tsv" from version 2.7.5, 3.5.6, and 3.6.7.  Changing * to + produced expected "name.tsv".

python -c 'import re; v="name.txt";v = re.sub("[^\.]*$", "tsv", v);print(v)'
msg375444 - (view) Author: Ma Lin (malin) * Date: 2020-08-15 02:37
The re.sub() doc said:
Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

IMO 3.7+ behavior is more reasonable, and it fixed a bug, see issue25054.
msg375459 - (view) Author: S. Zhang (spz1st) Date: 2020-08-15 12:10
Thanks.   But if talking about empty matches, there would be endless empty
matches at the end in such cases.  So in my opinion, [^\.]*$ should match
txt plus the empty match because the greedy rule applies here.

On Fri, Aug 14, 2020 at 10:37 PM Ma Lin <report@bugs.python.org> wrote:

>
> Ma Lin <malincns@163.com> added the comment:
>
> The re.sub() doc said:
> Changed in version 3.7: Empty matches for the pattern are replaced when
> adjacent to a previous non-empty match.
>
> IMO 3.7+ behavior is more reasonable, and it fixed a bug, see issue25054.
>
> ----------
> nosy: +malin
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41555>
> _______________________________________
>
msg375463 - (view) Author: Ma Lin (malin) * Date: 2020-08-15 13:05
There can be at most one empty match at a position. IIRC, Perl's regex engine has very similar behavior.
If don't want empty match, use + is fine.
msg375475 - (view) Author: S. Zhang (spz1st) Date: 2020-08-15 16:33
Okay.  Thanks.

On Sat, Aug 15, 2020 at 9:07 AM Ma Lin <report@bugs.python.org> wrote:

>
> Ma Lin <malincns@163.com> added the comment:
>
> There can be at most one empty match at a position. IIRC, Perl's regex
> engine has very similar behavior.
> If don't want empty match, use + is fine.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41555>
> _______________________________________
>
msg376429 - (view) Author: S. Zhang (spz1st) Date: 2020-09-05 12:21
Thanks.

On Mon, Aug 17, 2020 at 7:40 PM Pablo Galindo Salgado <
report@bugs.python.org> wrote:

>
> Change by Pablo Galindo Salgado <pablogsal@gmail.com>:
>
>
> ----------
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41555>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:59:34adminsetgithub: 85727
2020-09-05 12:21:11spz1stsetmessages: + msg376429
2020-08-17 23:40:00pablogsalsetstatus: open -> closed
resolution: not a bug
stage: resolved
2020-08-15 16:33:48spz1stsetmessages: + msg375475
2020-08-15 13:05:27malinsetmessages: + msg375463
2020-08-15 12:10:48spz1stsetmessages: + msg375459
2020-08-15 02:37:39malinsetnosy: + malin
messages: + msg375444
2020-08-14 22:28:33spz1stcreate