This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.finditer() oddity
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: ezio.melotti, fgracia, mrabarnett, rhettinger
Priority: normal Keywords:

Created on 2012-05-27 09:13 by fgracia, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg161705 - (view) Author: Francisco Gracia (fgracia) Date: 2012-05-27 09:13
I find baffling the following behaviour of *re.finditer()*:

    Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> import re
    >>> m = re.finditer( '123', 'abc' )
    >>> m
    <callable_iterator object at 0x00BF09B0>
    >>> if m : 'I am Napoleon'
    
    'I am Napoleon'

No other way of formulating the condition that I have tried has worked either. Apparently *m* is always true, although all efforts to test its value indicate the contrary:

    >>> m == True
    False
    >>>

This does not happen with any other of the related methods (*findall*, *match*, *search*), which no doubt is the correct and logical behaviour:

    >>> n = re.findall( '123', 'abc' )
    >>> n
    []
    >>> if n : 'I am Napoleon'
    
    >>> 

I have not seen any warning or explanation for this fact in the official or third party documentation that I have consulted. Perhaps it is not a bug, but, as the preceding lines show, it makes impossible to test the result of the operation and direct the subsequent program flow.

If this were an unavoidable feature of *re.finditer*, it should be at least clearly exposed and, if possible, with indications of how to circumvent its undesirable consequences.

Thanks for your attention and efforts.
msg161707 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-05-27 10:07
All iterators are always true, since you can not know how many elements they will give you until you consume them.  This is generally known, however it doesn't seem to be well documented.
msg161712 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-05-27 16:52
> All iterators are always true,

More generally, all objects are true by default.  The only false objects in Python are None; container-like objects with a __len__ that returns zero; and number-like objects with a __nonzero__ method that returns False.

Guido decided that iterators should not be treated like containers and should not have a __len__ method.
msg161723 - (view) Author: Francisco Gracia (fgracia) Date: 2012-05-27 20:40
Thank you both for your quick and clear explanations. However I regret that I keep considering the situation rather unsatisfactory.

I can well understand that all objects are true and even that the convention that applies to some of them, like containers, that when they are empty they be considered false, does not apply to iterators; no doubt there will be good reasons for it being so. What seems illogical to me is that a method like *re.finditer()* returns as the result of its execution something that can only be interpreted as implying that what happened is exactly the contrary of what it really was. Why does it have to return anything different from the standard *None* when the match fails, like any other operation that fails? What use is in an *iterator* that leads nowhere?

I would find clarifying if at least the documentation formulated unambigously that it is unwise to submit iterators (and specifically the ones returned by methods like *finditer()* whose same names imply the opposite) to logical testing and that the only valid use they have is as arguments in *for* loops.

I think that the same vagorosity in this respect transpires in other places of the documentation. For instance in paragraph 7.2.5 of *re.html* it is said:

   Match objects always have a boolean value of *True*, so that you can test whether e. g. *match()* resulted in a match with a simple *if* statement.

As it is formulated, this clause is clearly contradictory: if something points always in the same direction, it cannot be used as the basis for any decission about which of two roads to take. Match objects can very well be always true (in fact, as all objects are in this sense, as paragraph 5.1 of *stdtypes.html* comes close to formulating it); what happens is that *re.match()* or *re.search()* do not return them when they fail, but the familiar and well behaved *None* value.
History
Date User Action Args
2022-04-11 14:57:30adminsetgithub: 59129
2012-05-27 20:40:02fgraciasetmessages: + msg161723
2012-05-27 16:52:34rhettingersetnosy: + rhettinger
messages: + msg161712
2012-05-27 10:07:07ezio.melottisetstatus: open -> closed
versions: + Python 2.7, Python 3.3
messages: + msg161707

assignee: ezio.melotti
resolution: not a bug
stage: resolved
2012-05-27 09:13:17fgraciacreate