This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients alegrigoriev, ezio.melotti, mrabarnett, serhiy.storchaka, terry.reedy
Date 2021-04-10.02:54:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1618023260.12.0.433889907062.issue43714@roundup.psfhosted.org>
In-reply-to
Content
Python regexes match slices of a Python string s.  The latter include the len(s)+1 empty slices of s.  An re Match gives both the slice itself as match attribute and its slice coordinates (span) in the searched string.

https://docs.python.org/3/library/re.html says "\Z  Matches only at the end of the string." There are two possible interpretations:
1. '\Z', by itself, matches the final empty slice s[n:n] of search string s, where n = len(s).  
2. '\Z' modifies the (preceding) re to match "only at the end of the string", where the preceding re can be empty.

For a single left to right search, I believe there is no difference.  (I use '$' instead of '\Z', which I believe is the same without the re.MULTILINE flag.)

>>> re.search(r'', 'a')
<re.Match object; span=(0, 0), match=''>
>>> re.search(r'$', 'a')
<re.Match object; span=(1, 1), match=''>

Either interpretation explains and is consistent with the second result.

The issue is functions that look for multiple sequential matches.  re.sub and re.split are based on re.finditer, which listed by re.findall.  The latter two return all non-overlapping matches (slices), including empty slices.  Hence, with an an regex that matches final '/'  or '', 

>>> re.findall(r'/?$', '/')
['/', '']

I believe Alexander proposes that the 2nd member should not be there, but it is a match starting after '/' and does not overlap.

The word 'consume' only appears in the current doc once  -- "(?=...)    Matches if ... matches next, but doesn’t consume any of the string."  If we consider 'end of string' to be the final null slice, it does seem to be 'consumed' in that the final empty slice is only matched and added to the list once.

I think that this should be closed as 'not a bub'.

As for the desired results for the examples, they involve manipulating the result of deleting a final '/' if there is one (and re is not even needed  that).

>>> [re.sub('/$', '', 'a/b/c/d/'), '']
['a/b/c/d', '']
>>> re.sub('/$', '', 'a/b/c/d/') + '-'
'a/b/c/d-'
History
Date User Action Args
2021-04-10 02:54:20terry.reedysetrecipients: + terry.reedy, ezio.melotti, mrabarnett, serhiy.storchaka, alegrigoriev
2021-04-10 02:54:20terry.reedysetmessageid: <1618023260.12.0.433889907062.issue43714@roundup.psfhosted.org>
2021-04-10 02:54:20terry.reedylinkissue43714 messages
2021-04-10 02:54:19terry.reedycreate