msg52979 - (view) |
Author: Stargaming (stargaming) |
Date: 2007-08-02 16:14 |
Membership tests for xrange result in loss of xrange's laziness by iterating through every single item. This yields a complexity of O(n) and an unbearable runtime when handling larger ranges.
This issue can be fixed by using arithmetic decision instead of generic lookup/iteration. This will boil down the complexity to O(1) and make tests near the start of the range as fast as those near the end. The same technique is already used in item lookups.
The fix has been inspired by the "optimizing [x]range" thread in the python-3000-devel mailing list (see http://article.gmane.org/gmane.comp.python.python-3000.devel/8732).
A patch to Objects/rangeobject.c is attached. It modifies tp_as_sequence in PyRange_Type to include the sq_contains field and implement the procedure range_contains right there.
|
msg52980 - (view) |
Author: Stargaming (stargaming) |
Date: 2007-08-06 11:37 |
The Python 3000 patch required a rewrite from scratch since we can't issue PyIntObject->ob_ival directly any longer. Instead, we use the PyNumber-API now.
File Added: rangeobject.c.diff
|
msg59501 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-01-07 22:14 |
The 2.6 patch should probably use long instead of int.
|
msg62955 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-02-25 00:57 |
> Do we really need another way to spell a <= x < b? Have you got a
> real-world use case in mind for the version with step > 1?
> I'm at most lukewarm; I'd be willing to look at a patch to the C code
> in the py3k-struni branch, plus unit tests though.
-- GvR at http://thread.gmane.org/gmane.comp.python.python-
3000.devel/8732
I read this as a rejection for 2.x series. For py3k, this is a
premature optimization. Current py3k implementation is likely to be
optimized for regular size integers at some point. This patch will only
introduce more code to be changed then.
If this is not ready to be rejected for 2.x, let's wait for resolution
of issue1540617 since it may result in backporting of py3k range
implementation.
|
msg63143 - (view) |
Author: Stargaming (stargaming) |
Date: 2008-02-29 19:53 |
Later on, when Greg mentions the current for/if performance asymmetry,
Guido states:
> Fair enough. So maybe *you* can contribute a patch?
I don't read this as a rejection, but of course you're right -- this
patch is purely an optimization. As already written in my initial
comment (and discussed on the mailing list), there would be no change in
behaviour: The change from generic iterator behaviour to specific range
performance is transparent to the end-user.
I don't see how the other patches interfere with this one. Waiting until
the development of the range object itself has a stabilized and we
decided upon a member type/API seems sensible. Still, this patch would
be valid on its own.
Now, your impulse is right: having the performance enhancement in the
bug tracker doesn't help much -- we need a resolution for this. If
someone could review the patch, tell me about the critical parts or
point me to references on how to improve it, I'd be really glad!
On the mentioned unit tests: I'm unsure how to verify this behaviour
since I expect it to affect runtime *only*.
PS. With the new tracker, wouldn't the "resource usage" type fit best?
|
msg63146 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-02-29 21:33 |
Here are my comments on the py3k patch:
1. Sign of a PyLong object o is the same as the sign of Py_SIZE(o). I
would think it is safe to use this fact within python core. (User code
that may need to work across multiple versions of python may need to be
more conservative.)
2. If you choose to use zero in the code, there is no need to create it
twice per call.
3. Py_INCREF(zero); is unnecessary and creates a reference leak.
4. range_contains should return -1 when error is detected, not 0.
2.
|
msg92185 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-02 21:10 |
Closed issue 6826 as a duplicate of this issue.
|
msg92187 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2009-09-02 21:46 |
+1 for an O(1) contains-test.
|
msg92188 - (view) |
Author: Joseph Thomson (hpesoj) |
Date: 2009-09-02 21:47 |
Seeing as the range type has most likely stabalised by now, I would like
to renew this issue. The current behaviour of range/xrange could
introduce unforeseen performance issues if users are not aware of its
inner workings.
Also, as I said in my closed duplicate issue, 'if value in range(lower,
upper)' to me looks far more Pythonic than 'if value >= lower and value
< upper'. Although this may be a minor point, I think it is noteworthy.
As for the prospect of producing a patch myself, I have checked out and
taken a look at the code, but it is somewhat alien to me, and I thought
it'd probably be more sensible to leave it to the people who know what
they're doing.
|
msg92195 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-03 09:18 |
The py3k patch no longer works: it makes use of PyObject_Cmp, which no
longer exists.
|
msg92197 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-03 10:03 |
The trunk patch is also unacceptable in its current form:
1. there are no docs or tests
2. keyval, start, step and end should have type long, not type int
(as Antoine already mentioned)
3. the expression ((keyval - start) % step) can overflow, leading to
undefined behaviour (e.g., wrong results, segfaults, strange
effects from gcc optimizations that assume no overflow). For
example, with the patch, on Linux/x86-64, I get:
>>> x = xrange(-2000000000, 2000000000, 5)
>>> 1000000000 in x
False
This should be relatively easy to fix: e.g., if you already
know that step > 0 and start <= keyval and keyval < stop, then
'(unsigned long)keyval - (unsigned long)start' is safe from
overflow.
4. the containment check only works for ints: with the patch, I get:
>>> x = xrange(10)
>>> 4 in x
True
>>> 4L in x
False
>>> 4.0 in x
False
but without the patch applied, all these return True. It's possible
that it's worth special-casing integer inputs for the sake of
speed, but I don't think the behaviour should change like this for
other types.
|
msg92198 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-03 10:06 |
[Joseph Thomson]
> Also, as I said in my closed duplicate issue, 'if value in range(lower,
> upper)' to me looks far more Pythonic than 'if value >= lower and value
> < upper'.
Note that the Pythonic spelling would be: 'if lower <= value < upper'.
(Though that's not quite the same thing if value is not an integer.)
|
msg92899 - (view) |
Author: Robert Lehmann (lehmannro) * |
Date: 2009-09-20 16:37 |
I revised the patch for Python 3.1 and added notices to Misc/NEWS and
the range documentation.
(Changing Type to resource usage.)
|
msg92926 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-21 09:39 |
Robert,
The patch looks good: thank you.
Please use C89-style comments (/* ... */).
I'd like to see a few more tests covering the various combinations of
start less-than/equal-to/greater-than stop, step positive/negative, tested
value within/without/at endpoints of the range interval, etc.
|
msg92927 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-21 09:49 |
Also, it would be good to add a test or two for non-integers, e.g. to make
explicit that the following behaviour hasn't changed:
>>> class C:
... def __int__(self): return 3
... def __index__(self): return 5
... def __eq__(self, other): return other == 4
...
>>> C() in range(0, 10, 2)
True
|
msg92934 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2009-09-21 11:42 |
As a small style note, I would prefer if the patch assigned in
conditionals less and split them out to the line before. I see that
rangeobject.c has a mixed style with regards to this, so the clearer one
should win! :)
|
msg92967 - (view) |
Author: Robert Lehmann (lehmannro) * |
Date: 2009-09-21 22:27 |
Thanks for your feedback. I added a few tests and changed the bits you
criticized.
|
msg93014 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-22 19:49 |
Great---thanks!
Would it be worth using PyObject_IsTrue instead of comparing with zero
using PyObject_RichCompareBool? (Sorry; should have spotted this last
time around.)
|
msg93019 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-22 21:48 |
Answering myself: probably not worth it: I failed to notice that you
already need zero for the 'step > 0' comparison.
Applied in r75028.
Leaving open for consideration of 2.x xrange.
|
msg93091 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-24 20:05 |
Just to be on the safe side, I changed the PyLong_Check(ob) check to PyLong_CheckExact(ob) || PyBool_Check(ob), in r75051. Without this, one
can get different results for 'x in range(10)' and 'x in list(range(10))'
if x is an instance of a subclass of int:
Python 3.2a0 (py3k:75050, Sep 24 2009, 20:56:13)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class C(int):
... def __eq__(self, other): return True
...
>>> C(11) in range(10)
False
>>> C(11) in list(range(10))
True
|
msg93092 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-09-24 20:15 |
Unassigning myself since I don't intend to do anything myself about
xrange.__contains__ in 2.x. (But I'll happily review patches.)
|
msg103301 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-04-16 08:00 |
I suggest closing this: it's been implemented (for range) in Python 3.x, and I think it's too late for 2.x now.
|
msg103336 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-16 16:20 |
As an implementation detail, it isn't too late to put this into 2.x if it is something we care about.
|
msg106884 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2010-06-02 14:33 |
I'd like to implement this for 2.x, if there's any chance of this being accepted. Is there still a chance of getting this into 2.7?
This will be my first patch for the Python core, so please tell me if I'm missing something. Currently I'm thinking of:
1. starting with the original patch
2. applying the required changes as mentioned Mark Dickinson, looking at the the accepted 3.x patch for guidance
3. back-porting the tests
4. checking for later additional changes in the source history, and back-porting any such relevant changes
|
msg106890 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-06-02 15:02 |
A rapidly diminishing chance for 2.7, IMO. I wouldn't want to try to add this after the first release candidate, which is scheduled for June 5 (and I'm not sure I'll have time to review a patch between now and then).
On the other hand, since this is pure optimization it might be acceptable for 2.7.1.
|
msg106891 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2010-06-02 15:17 |
I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?
If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.
|
msg106893 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-06-02 15:34 |
Benjamin?
> I'd really like to have this in 2.7. Is there no chance of getting it
> into 2.7 after rc1 is released?
|
msg106899 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-06-02 18:02 |
On Wed, Jun 2, 2010 at 11:17 AM, Tal Einat <report@bugs.python.org> wrote:
..
> If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review
> (and possibly commit) in time for RC1 I guess I'll take my time with this.
I'll review your patch, but why are you so eager to get this into 2.7?
You realize that for integer x, x in xrange(a, b) is just another way
to spell a <= x < b. What is your use case?
|
msg106901 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-06-02 18:07 |
> > If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review
> > (and possibly commit) in time for RC1 I guess I'll take my time with this.
>
> I'll review your patch, but why are you so eager to get this into 2.7?
> You realize that for integer x, x in xrange(a, b) is just another way
> to spell a <= x < b. What is your use case?
I really think this shouldn't go into 2.7. We certainly have better
things to do than potentially break something for an almost useless
non-feature.
|
msg106902 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-06-02 18:07 |
Rescheduling to 3.2, if applicable (otherwise I suggest closing).
|
msg106903 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2010-06-02 18:09 |
I don't really see the point. I would be more inclined towards it if there was a patch already, but patching this doesn't strike me as a key feature.
|
msg106904 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-06-02 18:10 |
This is already in 3.x. The only reason I can think of to get this in
2.7 is to have fewer performance surprises between 2.x and 3.x.
|
msg106907 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-06-02 18:18 |
Le mercredi 02 juin 2010 à 18:10 +0000, Alexander Belopolsky a écrit :
> This is already in 3.x. The only reason I can think of to get this in
> 2.7 is to have fewer performance surprises between 2.x and 3.x.
Since you are supposed to forward-port from 2.x and 3.x, the surprise
will be a good one anyway.
|
msg106932 - (view) |
Author: Tal Einat (taleinat) * |
Date: 2010-06-03 07:30 |
In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.
This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.
I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.
|
msg106966 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2010-06-03 17:28 |
2010/6/3 Tal Einat <report@bugs.python.org>:
>
> Tal Einat <taleinat@users.sourceforge.net> added the comment:
>
> In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.
>
> This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.
>
> I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.
xrange has behaved like this for such a long time that I don't see
what it buys us to commit the patch this late.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:25 | admin | set | github: 45269 |
2010-06-03 17:32:05 | benjamin.peterson | set | status: open -> closed resolution: accepted |
2010-06-03 17:28:45 | benjamin.peterson | set | messages:
+ msg106966 |
2010-06-03 07:30:14 | taleinat | set | messages:
+ msg106932 versions:
+ Python 2.7, - Python 3.2 |
2010-06-02 18:18:40 | pitrou | set | messages:
+ msg106907 |
2010-06-02 18:10:16 | belopolsky | set | messages:
+ msg106904 |
2010-06-02 18:09:09 | benjamin.peterson | set | messages:
+ msg106903 |
2010-06-02 18:07:29 | pitrou | set | messages:
+ msg106902 versions:
+ Python 3.2, - Python 2.7 |
2010-06-02 18:07:01 | pitrou | set | messages:
+ msg106901 |
2010-06-02 18:03:23 | belopolsky | set | assignee: belopolsky |
2010-06-02 18:02:59 | belopolsky | set | messages:
+ msg106899 |
2010-06-02 15:34:33 | mark.dickinson | set | messages:
+ msg106893 |
2010-06-02 15:17:31 | taleinat | set | messages:
+ msg106891 |
2010-06-02 15:02:42 | mark.dickinson | set | messages:
+ msg106890 |
2010-06-02 14:33:46 | taleinat | set | nosy:
+ taleinat messages:
+ msg106884
|
2010-04-16 16:20:07 | rhettinger | set | priority: normal -> low
messages:
+ msg103336 |
2010-04-16 08:00:37 | mark.dickinson | set | versions:
- Python 3.2 |
2010-04-16 08:00:13 | mark.dickinson | set | messages:
+ msg103301 |
2010-04-16 07:54:11 | abacabadabacaba | set | nosy:
+ abacabadabacaba
|
2009-09-24 20:16:02 | mark.dickinson | set | assignee: mark.dickinson -> (no value) |
2009-09-24 20:15:44 | mark.dickinson | set | messages:
+ msg93092 |
2009-09-24 20:05:44 | mark.dickinson | set | messages:
+ msg93091 |
2009-09-22 21:48:00 | mark.dickinson | set | messages:
+ msg93019 |
2009-09-22 19:49:49 | mark.dickinson | set | messages:
+ msg93014 |
2009-09-21 22:27:11 | lehmannro | set | files:
+ range.patch
messages:
+ msg92967 |
2009-09-21 11:42:35 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages:
+ msg92934
|
2009-09-21 09:49:33 | mark.dickinson | set | assignee: mark.dickinson messages:
+ msg92927 |
2009-09-21 09:39:11 | mark.dickinson | set | messages:
+ msg92926 |
2009-09-20 16:37:41 | lehmannro | set | files:
+ range.patch
nosy:
+ lehmannro messages:
+ msg92899
type: enhancement -> resource usage |
2009-09-03 10:06:55 | mark.dickinson | set | messages:
+ msg92198 |
2009-09-03 10:03:16 | mark.dickinson | set | messages:
+ msg92197 |
2009-09-03 09:18:18 | mark.dickinson | set | messages:
+ msg92195 |
2009-09-02 21:47:11 | hpesoj | set | messages:
+ msg92188 |
2009-09-02 21:46:11 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg92187
|
2009-09-02 21:18:42 | mark.dickinson | set | nosy:
+ hpesoj
|
2009-09-02 21:12:12 | mark.dickinson | set | stage: needs patch type: enhancement versions:
+ Python 2.7, Python 3.2, - Python 2.6, Python 3.0 |
2009-09-02 21:10:45 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg92185
|
2009-09-02 21:09:58 | mark.dickinson | link | issue6826 superseder |
2008-02-29 21:33:16 | belopolsky | set | messages:
+ msg63146 |
2008-02-29 19:53:15 | stargaming | set | messages:
+ msg63143 |
2008-02-25 00:57:26 | belopolsky | set | nosy:
+ belopolsky messages:
+ msg62955 |
2008-01-07 22:14:34 | pitrou | set | nosy:
+ pitrou messages:
+ msg59501 |
2008-01-06 22:29:45 | admin | set | keywords:
- py3k versions:
Python 2.6, Python 3.0 |
2007-08-30 00:24:23 | gvanrossum | set | versions:
+ Python 3.0 |
2007-08-23 22:00:55 | georg.brandl | set | keywords:
+ py3k |
2007-08-02 16:14:47 | stargaming | create | |