This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Quantifier and Expanded Regex Expression Gives Different Results
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, mrabarnett, veky, vmd3.14
Priority: normal Keywords:

Created on 2022-03-07 13:19 by vmd3.14, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
ComandPrompt.pdf vmd3.14, 2022-03-07 13:19 Issue Recreation
Messages (2)
msg414668 - (view) Author: Vivian D (vmd3.14) Date: 2022-03-07 13:19
Here are the steps that I went through to test my regular expressions in my command prompt (a file attachment shows this as well). I am using Windows 11, version 21H2:

>>> import re
>>> regex = r"(((\w)+\w*\3){2}|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*"
>>> testString = "Alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Mississipp', 'ipp', 'p', '', '')]
>>> testString = "alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Mississipp', 'ipp', 'p', '', '')]
>>> regex = r"((\w)+\w*\2(\w)+\w*\3|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*"
>>> re.findall(regex,testString,re.IGNORECASE)
[('alabama', 'a', 'a', '', ''), ('Mississipp', 's', 'p', '', '')]
>>> testString = "Alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Alabama', 'A', 'a', '', ''), ('Mississipp', 's', 'p', '', '')]

I created a regular expression to match any words with two sets of the same vowel, including words with four of the same vowel, ignoring case. My first regular expression “(((\w)+\w*\3){2}|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*" was able to match “Mississippi” but unable to match “Alabama” as it should have. To make sure that this error wasn’t somehow caused by a case sensitivity issue, I retested the regex with “alabama” instead of “Alabama”, but still I got no match on “alabama”. Then I tried replacing the quantifier {2} with just expression that was supposed to be repeated. This gave me the regex: "((\w)+\w*\2(\w)+\w*\3|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*". For some reason, this was able to match on both “alabama” and “Alabama” now, as shown above, and continued to match on Mississippi like expected. However, this result seems to contradict my understand of regular expressions because all I did to get these different results was copy the expression that was supposed to be executed twice by the quantifier.
msg414671 - (view) Author: Vedran Čačić (veky) * Date: 2022-03-07 14:24
Confirmed. On Python 3.10.2,

    >>> re.findall(r"(((\w)+\w*\3){2}|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*",'alabama')
    []

yet https://regex101.com/r/uT8gag/1 (with "Python" selected) says it should match.
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91101
2022-03-07 18:54:52eric.smithsetnosy: + ezio.melotti, mrabarnett
components: + Regular Expressions, - Library (Lib)
2022-03-07 14:24:47vekysetnosy: + veky
messages: + msg414671
2022-03-07 13:19:26vmd3.14create