classification
Title: improve xrange.__contains__
Type: resource usage Stage: needs patch
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: abacabadabacaba, belopolsky, benjamin.peterson, hpesoj, lehmannro, mark.dickinson, pitrou, rhettinger, stargaming, taleinat
Priority: low Keywords: patch

Created on 2007-08-02 16:14 by stargaming, last changed 2010-06-03 17:32 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
rangeobject.c.diff stargaming, 2007-08-02 16:14 patch for Objects/rangeobject.c
rangeobject.c.diff stargaming, 2007-08-06 11:37 patch for Objects/rangeobject.c, p3yk branch
range.patch lehmannro, 2009-09-20 16:37 patch for Objects/rangeobject.c and docs, py3k branch
range.patch lehmannro, 2009-09-21 22:27 patch #2 for Objects/rangeobject.c & docs & tests, py3k branch
Messages (35)
msg52979 - (view) Author: Stargaming (stargaming) Date: 2007-08-02 16:14
Membership tests for xrange result in loss of xrange's laziness by iterating through every single item. This yields a complexity of O(n) and an unbearable runtime when handling larger ranges.

This issue can be fixed by using arithmetic decision instead of generic lookup/iteration. This will boil down the complexity to O(1) and make tests near the start of the range as fast as those near the end. The same technique is already used in item lookups.

The fix has been inspired by the "optimizing [x]range" thread in the python-3000-devel mailing list (see http://article.gmane.org/gmane.comp.python.python-3000.devel/8732).

A patch to Objects/rangeobject.c is attached. It modifies tp_as_sequence in PyRange_Type to include the sq_contains field and implement the procedure range_contains right there.
msg52980 - (view) Author: Stargaming (stargaming) Date: 2007-08-06 11:37
The Python 3000 patch required a rewrite from scratch since we can't issue PyIntObject->ob_ival directly any longer. Instead, we use the PyNumber-API now.
File Added: rangeobject.c.diff
msg59501 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-07 22:14
The 2.6 patch should probably use long instead of int.
msg62955 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-02-25 00:57
> Do we really need another way to spell a <= x < b? Have you got a
> real-world use case in mind for the version with step > 1?

> I'm at most lukewarm; I'd be willing to look at a patch to the C code
> in the py3k-struni branch, plus unit tests though.

-- GvR at http://thread.gmane.org/gmane.comp.python.python-
3000.devel/8732

I read this as a rejection for 2.x series.  For py3k, this is a 
premature optimization.  Current py3k implementation is likely to be 
optimized for regular size integers at some point.  This patch will only 
introduce more code to be changed then.

If this is not ready to be rejected for 2.x, let's wait for resolution 
of issue1540617 since it may result in backporting of py3k range 
implementation.
msg63143 - (view) Author: Stargaming (stargaming) Date: 2008-02-29 19:53
Later on, when Greg mentions the current for/if performance asymmetry,
Guido states: 

> Fair enough. So maybe *you* can contribute a patch?

I don't read this as a rejection, but of course you're right -- this
patch is purely an optimization. As already written in my initial
comment (and discussed on the mailing list), there would be no change in
behaviour: The change from generic iterator behaviour to specific range
performance is transparent to the end-user.

I don't see how the other patches interfere with this one. Waiting until
the development of the range object itself has a stabilized and we
decided upon a member type/API seems sensible. Still, this patch would 
be valid on its own.

Now, your impulse is right: having the performance enhancement in the
bug tracker doesn't help much -- we need a resolution for this. If
someone could review the patch, tell me about the critical parts or
point me to references on how to improve it, I'd be really glad!

On the mentioned unit tests: I'm unsure how to verify this behaviour
since I expect it to affect runtime *only*.

PS. With the new tracker, wouldn't the "resource usage" type fit best?
msg63146 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-02-29 21:33
Here are my comments on the py3k patch:

1. Sign of a PyLong object o is the same as the sign of Py_SIZE(o).  I 
would think it is safe to use this fact within python core.  (User code 
that may need to work across multiple versions of python may need to be 
more conservative.)

2. If you choose to use zero in the code, there is no need to create it 
twice per call.

3. Py_INCREF(zero); is unnecessary and creates a reference leak.

4. range_contains should return -1 when error is detected, not 0.

2.
msg92185 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-02 21:10
Closed issue 6826 as a duplicate of this issue.
msg92187 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-09-02 21:46
+1 for an O(1) contains-test.
msg92188 - (view) Author: Joseph Thomson (hpesoj) Date: 2009-09-02 21:47
Seeing as the range type has most likely stabalised by now, I would like
to renew this issue.  The current behaviour of range/xrange could
introduce unforeseen performance issues if users are not aware of its
inner workings.

Also, as I said in my closed duplicate issue, 'if value in range(lower,
upper)' to me looks far more Pythonic than 'if value >= lower and value
< upper'.  Although this may be a minor point, I think it is noteworthy.

As for the prospect of producing a patch myself, I have checked out and
taken a look at the code, but it is somewhat alien to me, and I thought
it'd probably be more sensible to leave it to the people who know what
they're doing.
msg92195 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-03 09:18
The py3k patch no longer works:  it makes use of PyObject_Cmp, which no
longer exists.
msg92197 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-03 10:03
The trunk patch is also unacceptable in its current form:

 1. there are no docs or tests
 2. keyval, start, step and end should have type long, not type int
    (as Antoine already mentioned)
 3. the expression ((keyval - start) % step) can overflow, leading to
    undefined behaviour (e.g., wrong results, segfaults, strange
    effects from gcc optimizations that assume no overflow).  For
    example, with the patch, on Linux/x86-64, I get:

      >>> x = xrange(-2000000000, 2000000000, 5)
      >>> 1000000000 in x
      False

    This should be relatively easy to fix:  e.g., if you already
    know that step > 0 and start <= keyval and keyval < stop, then
    '(unsigned long)keyval - (unsigned long)start'  is safe from
    overflow.

 4. the containment check only works for ints:  with the patch, I get:

      >>> x = xrange(10)
      >>> 4 in x
      True
      >>> 4L in x
      False
      >>> 4.0 in x
      False

    but without the patch applied, all these return True.  It's possible
    that it's worth special-casing integer inputs for the sake of
    speed, but I don't think the behaviour should change like this for
    other types.
msg92198 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-03 10:06
[Joseph Thomson]
> Also, as I said in my closed duplicate issue, 'if value in range(lower,
> upper)' to me looks far more Pythonic than 'if value >= lower and value
> < upper'.

Note that the Pythonic spelling would be:  'if lower <= value < upper'.
 (Though that's not quite the same thing if value is not an integer.)
msg92899 - (view) Author: Robert Lehmann (lehmannro) * Date: 2009-09-20 16:37
I revised the patch for Python 3.1 and added notices to Misc/NEWS and
the range documentation.

(Changing Type to resource usage.)
msg92926 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-21 09:39
Robert,

The patch looks good: thank you.

Please use C89-style comments (/* ... */).

I'd like to see a few more tests covering the various combinations of
start less-than/equal-to/greater-than stop, step positive/negative, tested 
value within/without/at endpoints of the range interval, etc.
msg92927 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-21 09:49
Also, it would be good to add a test or two for non-integers, e.g. to make 
explicit that the following behaviour hasn't changed:

>>> class C:
...     def __int__(self):  return 3
...     def __index__(self): return 5
...     def __eq__(self, other): return other == 4
... 
>>> C() in range(0, 10, 2)
True
msg92934 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-09-21 11:42
As a small style note, I would prefer if the patch assigned in
conditionals less and split them out to the line before. I see that
rangeobject.c has a mixed style with regards to this, so the clearer one
should win! :)
msg92967 - (view) Author: Robert Lehmann (lehmannro) * Date: 2009-09-21 22:27
Thanks for your feedback. I added a few tests and changed the bits you
criticized.
msg93014 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-22 19:49
Great---thanks!

Would it be worth using PyObject_IsTrue instead of comparing with zero 
using PyObject_RichCompareBool?  (Sorry;  should have spotted this last 
time around.)
msg93019 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-22 21:48
Answering myself:  probably not worth it:  I failed to notice that you 
already need zero for the 'step > 0' comparison.

Applied in r75028.

Leaving open for consideration of 2.x xrange.
msg93091 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-24 20:05
Just to be on the safe side, I changed the PyLong_Check(ob) check to PyLong_CheckExact(ob) || PyBool_Check(ob), in r75051.  Without this, one 
can get different results for 'x in range(10)' and 'x in list(range(10))' 
if x is an instance of a subclass of int:

Python 3.2a0 (py3k:75050, Sep 24 2009, 20:56:13) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class C(int):
...     def __eq__(self, other): return True
... 
>>> C(11) in range(10)
False
>>> C(11) in list(range(10))
True
msg93092 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-09-24 20:15
Unassigning myself since I don't intend to do anything myself about 
xrange.__contains__ in 2.x.  (But I'll happily review patches.)
msg103301 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-04-16 08:00
I suggest closing this:  it's been implemented (for range) in Python 3.x, and I think it's too late for 2.x now.
msg103336 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-16 16:20
As an implementation detail, it isn't too late to put this into 2.x if it is something we care about.
msg106884 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-02 14:33
I'd like to implement this for 2.x, if there's any chance of this being accepted. Is there still a chance of getting this into 2.7?

This will be my first patch for the Python core, so please tell me if I'm missing something. Currently I'm thinking of:
1. starting with the original patch
2. applying the required changes as mentioned Mark Dickinson, looking at the the accepted 3.x patch for guidance
3. back-porting the tests
4. checking for later additional changes in the source history, and back-porting any such relevant changes
msg106890 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-02 15:02
A rapidly diminishing chance for 2.7, IMO.  I wouldn't want to try to add this after the first release candidate, which is scheduled for June 5 (and I'm not sure I'll have time to review a patch between now and then).

On the other hand, since this is pure optimization it might be acceptable for 2.7.1.
msg106891 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-02 15:17
I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?

If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.
msg106893 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-02 15:34
Benjamin?

> I'd really like to have this in 2.7. Is there no chance of getting it
> into 2.7 after rc1 is released?
msg106899 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-02 18:02
On Wed, Jun 2, 2010 at 11:17 AM, Tal Einat <report@bugs.python.org> wrote:
..
> If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review
> (and possibly commit) in time for RC1 I guess I'll take my time with this.

I'll review your patch, but why are you so eager to get this into 2.7?
 You realize that for integer x, x in xrange(a, b) is just another way
to spell a <= x < b.  What is your use case?
msg106901 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 18:07
> > If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review
> > (and possibly commit) in time for RC1 I guess I'll take my time with this.
> 
> I'll review your patch, but why are you so eager to get this into 2.7?
>  You realize that for integer x, x in xrange(a, b) is just another way
> to spell a <= x < b.  What is your use case?

I really think this shouldn't go into 2.7.  We certainly have better
things to do than potentially break something for an almost useless
non-feature.
msg106902 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 18:07
Rescheduling to 3.2, if applicable (otherwise I suggest closing).
msg106903 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-06-02 18:09
I don't really see the point. I would be more inclined towards it if there was a patch already, but patching this doesn't strike me as a key feature.
msg106904 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-02 18:10
This is already in 3.x.  The only reason I can think of to get this in
2.7 is to have fewer performance surprises between 2.x and 3.x.
msg106907 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 18:18
Le mercredi 02 juin 2010 à 18:10 +0000, Alexander Belopolsky a écrit :
> This is already in 3.x.  The only reason I can think of to get this in
> 2.7 is to have fewer performance surprises between 2.x and 3.x.

Since you are supposed to forward-port from 2.x and 3.x, the surprise
will be a good one anyway.
msg106932 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-03 07:30
In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.

This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.

I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.
msg106966 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-06-03 17:28
2010/6/3 Tal Einat <report@bugs.python.org>:
>
> Tal Einat <taleinat@users.sourceforge.net> added the comment:
>
> In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.
>
> This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.
>
> I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.

xrange has behaved like this for such a long time that I don't see
what it buys us to commit the patch this late.
History
Date User Action Args
2010-06-03 17:32:05benjamin.petersonsetstatus: open -> closed
resolution: accepted
2010-06-03 17:28:45benjamin.petersonsetmessages: + msg106966
2010-06-03 07:30:14taleinatsetmessages: + msg106932
versions: + Python 2.7, - Python 3.2
2010-06-02 18:18:40pitrousetmessages: + msg106907
2010-06-02 18:10:16belopolskysetmessages: + msg106904
2010-06-02 18:09:09benjamin.petersonsetmessages: + msg106903
2010-06-02 18:07:29pitrousetmessages: + msg106902
versions: + Python 3.2, - Python 2.7
2010-06-02 18:07:01pitrousetmessages: + msg106901
2010-06-02 18:03:23belopolskysetassignee: belopolsky
2010-06-02 18:02:59belopolskysetmessages: + msg106899
2010-06-02 15:34:33mark.dickinsonsetmessages: + msg106893
2010-06-02 15:17:31taleinatsetmessages: + msg106891
2010-06-02 15:02:42mark.dickinsonsetmessages: + msg106890
2010-06-02 14:33:46taleinatsetnosy: + taleinat
messages: + msg106884
2010-04-16 16:20:07rhettingersetpriority: normal -> low

messages: + msg103336
2010-04-16 08:00:37mark.dickinsonsetversions: - Python 3.2
2010-04-16 08:00:13mark.dickinsonsetmessages: + msg103301
2010-04-16 07:54:11abacabadabacabasetnosy: + abacabadabacaba
2009-09-24 20:16:02mark.dickinsonsetassignee: mark.dickinson -> (no value)
2009-09-24 20:15:44mark.dickinsonsetmessages: + msg93092
2009-09-24 20:05:44mark.dickinsonsetmessages: + msg93091
2009-09-22 21:48:00mark.dickinsonsetmessages: + msg93019
2009-09-22 19:49:49mark.dickinsonsetmessages: + msg93014
2009-09-21 22:27:11lehmannrosetfiles: + range.patch

messages: + msg92967
2009-09-21 11:42:35benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg92934
2009-09-21 09:49:33mark.dickinsonsetassignee: mark.dickinson
messages: + msg92927
2009-09-21 09:39:11mark.dickinsonsetmessages: + msg92926
2009-09-20 16:37:41lehmannrosetfiles: + range.patch

nosy: + lehmannro
messages: + msg92899

type: enhancement -> resource usage
2009-09-03 10:06:55mark.dickinsonsetmessages: + msg92198
2009-09-03 10:03:16mark.dickinsonsetmessages: + msg92197
2009-09-03 09:18:18mark.dickinsonsetmessages: + msg92195
2009-09-02 21:47:11hpesojsetmessages: + msg92188
2009-09-02 21:46:11rhettingersetnosy: + rhettinger
messages: + msg92187
2009-09-02 21:18:42mark.dickinsonsetnosy: + hpesoj
2009-09-02 21:12:12mark.dickinsonsetstage: needs patch
type: enhancement
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2009-09-02 21:10:45mark.dickinsonsetnosy: + mark.dickinson
messages: + msg92185
2009-09-02 21:09:58mark.dickinsonlinkissue6826 superseder
2008-02-29 21:33:16belopolskysetmessages: + msg63146
2008-02-29 19:53:15stargamingsetmessages: + msg63143
2008-02-25 00:57:26belopolskysetnosy: + belopolsky
messages: + msg62955
2008-01-07 22:14:34pitrousetnosy: + pitrou
messages: + msg59501
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 2.6, Python 3.0
2007-08-30 00:24:23gvanrossumsetversions: + Python 3.0
2007-08-23 22:00:55georg.brandlsetkeywords: + py3k
2007-08-02 16:14:47stargamingcreate