Issue 14924: re.finditer() oddity

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59129

classification

Title:	re.finditer() oddity
Type:	behavior	Stage:	resolved
Components:	Regular Expressions	Versions:	Python 3.2, Python 3.3, Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:	ezio.melotti	Nosy List:	ezio.melotti, fgracia, mrabarnett, rhettinger
Priority:	normal	Keywords:

Created on 2012-05-27 09:13 by fgracia, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg161705 - (view)	Author: Francisco Gracia (fgracia)	Date: 2012-05-27 09:13
I find baffling the following behaviour of re.finditer(): Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import re >>> m = re.finditer( '123', 'abc' ) >>> m <callable_iterator object at 0x00BF09B0> >>> if m : 'I am Napoleon' 'I am Napoleon' No other way of formulating the condition that I have tried has worked either. Apparently m is always true, although all efforts to test its value indicate the contrary: >>> m == True False >>> This does not happen with any other of the related methods (findall, match, search), which no doubt is the correct and logical behaviour: >>> n = re.findall( '123', 'abc' ) >>> n [] >>> if n : 'I am Napoleon' >>> I have not seen any warning or explanation for this fact in the official or third party documentation that I have consulted. Perhaps it is not a bug, but, as the preceding lines show, it makes impossible to test the result of the operation and direct the subsequent program flow. If this were an unavoidable feature of re.finditer, it should be at least clearly exposed and, if possible, with indications of how to circumvent its undesirable consequences. Thanks for your attention and efforts.
msg161707 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-05-27 10:07
All iterators are always true, since you can not know how many elements they will give you until you consume them. This is generally known, however it doesn't seem to be well documented.
msg161712 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2012-05-27 16:52
> All iterators are always true, More generally, all objects are true by default. The only false objects in Python are None; container-like objects with a __len__ that returns zero; and number-like objects with a __nonzero__ method that returns False. Guido decided that iterators should not be treated like containers and should not have a __len__ method.
msg161723 - (view)	Author: Francisco Gracia (fgracia)	Date: 2012-05-27 20:40
Thank you both for your quick and clear explanations. However I regret that I keep considering the situation rather unsatisfactory. I can well understand that all objects are true and even that the convention that applies to some of them, like containers, that when they are empty they be considered false, does not apply to iterators; no doubt there will be good reasons for it being so. What seems illogical to me is that a method like re.finditer() returns as the result of its execution something that can only be interpreted as implying that what happened is exactly the contrary of what it really was. Why does it have to return anything different from the standard None when the match fails, like any other operation that fails? What use is in an iterator that leads nowhere? I would find clarifying if at least the documentation formulated unambigously that it is unwise to submit iterators (and specifically the ones returned by methods like finditer() whose same names imply the opposite) to logical testing and that the only valid use they have is as arguments in for loops. I think that the same vagorosity in this respect transpires in other places of the documentation. For instance in paragraph 7.2.5 of re.html it is said: Match objects always have a boolean value of True, so that you can test whether e. g. match() resulted in a match with a simple if statement. As it is formulated, this clause is clearly contradictory: if something points always in the same direction, it cannot be used as the basis for any decission about which of two roads to take. Match objects can very well be always true (in fact, as all objects are in this sense, as paragraph 5.1 of stdtypes.html comes close to formulating it); what happens is that re.match() or re.search() do not return them when they fail, but the familiar and well behaved None value.

History
Date	User	Action	Args
2022-04-11 14:57:30	admin	set	github: 59129
2012-05-27 20:40:02	fgracia	set	messages: + msg161723
2012-05-27 16:52:34	rhettinger	set	nosy: + rhettinger messages: + msg161712
2012-05-27 10:07:07	ezio.melotti	set	status: open -> closed versions: + Python 2.7, Python 3.3 messages: + msg161707 assignee: ezio.melotti resolution: not a bug stage: resolved
2012-05-27 09:13:17	fgracia	create