Message 331910 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	steve.newcomb, steven.daprano
Date	2018-12-16.00:58:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1544921926.07.0.788709270274.issue35496@psf.upfronthosting.co.za>
In-reply-to

Content
> See attached script, which is self-explanatory. I'm glad one of us thinks so, because I find it clear as mud. I spent way longer on this than I should have, but I simplified your sample code to the best of my ability. (See attached.) As far as I can tell, your code and mine does roughly the same thing, but please check that you agree. I agree that with the IPV6 portion of the regex removed, it matches on "208.123.4.22", but with the IPV6 portion included, it matches on "::ffff:208.123.4.22". But I'm not sure that's a bug. I think it is working as designed. For example: py> import re py> text = 'green pepper' py> re.search('pepper\|green pepper', text).group(0) 'green pepper' seems to be analogous to your example, but simpler. Do you agree? If not, it would also help a lot if you could find a simpler regex that demonstrates the issue. See http://www.sscce.org/ In your case, I believe that the rightmost alternative matches from position 1 of the text, while the leftmost alternative doesn't match until position 8. So starting from position 0, the IPV6 check matches first, and so wins. It is possible you were expecting that the IPV4 check would be tested against position 0, then position 1, then position 2, then ... and so on until the end of the string, and only then the IPV6 check tested against position 0, then 1 etc.

> See attached script, which is self-explanatory.

I'm glad one of us thinks so, because I find it clear as mud.

I spent *way* longer on this than I should have, but I simplified your sample code to the best of my ability. (See attached.) As far as I can tell, your code and mine does roughly the same thing, but please check that you agree.

I agree that with the IPV6 portion of the regex removed, it matches on "208.123.4.22", but with the IPV6 portion included, it matches on "::ffff:208.123.4.22". But I'm not sure that's a bug. I think it is working as designed. For example:


py> import re
py> text = 'green pepper'
py> re.search('pepper|green pepper', text).group(0)
'green pepper'


seems to be analogous to your example, but simpler. Do you agree? If not, it would also help a lot if you could find a simpler regex that demonstrates the issue. See http://www.sscce.org/

In your case, I believe that the rightmost alternative matches from position 1 of the text, while the leftmost alternative doesn't match until position 8. So starting from position 0, the IPV6 check matches first, and so wins.

It is possible you were expecting that the IPV4 check would be tested against position 0, then position 1, then position 2, then ... and so on until the end of the string, and only then the IPV6 check tested against position 0, then 1 etc.

History
Date	User	Action	Args
2018-12-16 00:58:46	steven.daprano	set	recipients: + steven.daprano, steve.newcomb
2018-12-16 00:58:46	steven.daprano	set	messageid: <1544921926.07.0.788709270274.issue35496@psf.upfronthosting.co.za>
2018-12-16 00:58:46	steven.daprano	link	issue35496 messages
2018-12-16 00:58:45	steven.daprano	create