Message 242685 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	larry
Recipients	larry, pitrou, rhettinger, serhiy.storchaka
Date	2015-05-06.16:18:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1430929136.04.0.0158927713933.issue24138@psf.upfronthosting.co.za>
In-reply-to

Content
This probably shouldn't be checked in. But it was an interesting experiment, and I did get it to work. My brother forwarded me this question from Stack Overflow: http://stackoverflow.com/questions/23453133/is-there-a-reason-python-3-enumerates-slower-than-python-2 The debate brought up a good point: a lot of the overhead of range() is in creating and destroying the long objects. I wondered, could I get rid of that? Long story short, yes. rangeiterobject is a special-case range object for when you're iterating over integer values and the values all fit inside C longs. Otherwise it has to use the much slower general-purpose longrangeiterobject.) rangeiter_next is simple: it computes the new value then returns PyLong_FromLong of that value. First thought: cache a reference to the previous value. If its reference count is 1, we have the only reference. Overwrite its value and return it. But that doesn't help in the general case, because in "for x in range(1000)" x is holding a reference at the time __next__ is called on the iterator. The trick: cache two old yielded objects. In the general case, by the second iteration, everyone else has dropped their references to the older cached object and we can modify it safely. The second trick: if the value you're yielding is one of the interned "small ints", you have to return the interned small int. (Otherwise you break 0 == 0, I kid you not.) With this patch applied all regression tests pass. And, on my desktop machine, the benchmark they used in the above link: ./python -mtimeit -n 5 -r 2 -s"cnt = 0" "for i in range(10000000): cnt += 1" drops from 410ms to 318ms ("5 loops, best of 2: 318 msec per loop"). This implementation requires the rangeiterobject to have intimate knowledge of the implementation of the PyLongObject, including copy-and-pasting some information that isn't otherwise exposed (max small int essentially). At the very least that information would need to be exposed properly so rangeiterobject could use it correctly before this could be checked in. It might be cleaner for longobject.c to expose a private "set this long object to this value" function. It would fail if we can't do it safely: if the value was an interned (small) int, or if the long was of the wrong size (Py_SIZE(o) != 1). Is this interesting enough to pursue? I'm happy to drop it.

This probably shouldn't be checked in.  But it was an interesting experiment, and I did get it to work.

My brother forwarded me this question from Stack Overflow:
    http://stackoverflow.com/questions/23453133/is-there-a-reason-python-3-enumerates-slower-than-python-2

The debate brought up a good point: a lot of the overhead of range() is in creating and destroying the long objects.  I wondered, could I get rid of that?  Long story short, yes.

rangeiterobject is a special-case range object for when you're iterating over integer values and the values all fit inside C longs.  Otherwise it has to use the much slower general-purpose longrangeiterobject.)  rangeiter_next is simple: it computes the new value then returns PyLong_FromLong of that value.

First thought: cache a reference to the previous value.  If its reference count is 1, we have the only reference.  Overwrite its value and return it.  But that doesn't help in the general case, because in "for x in range(1000)" x is holding a reference at the time __next__ is called on the iterator.

The trick: cache *two* old yielded objects.  In the general case, by the second iteration, everyone else has dropped their references to the older cached object and we can modify it safely.

The second trick: if the value you're yielding is one of the interned "small ints", you *have* to return the interned small int.  (Otherwise you break 0 == 0, I kid you not.)

With this patch applied all regression tests pass.  And, on my desktop machine, the benchmark they used in the above link:

./python -mtimeit -n 5 -r 2 -s"cnt = 0" "for i in range(10000000): cnt += 1"

drops from 410ms to 318ms ("5 loops, best of 2: 318 msec per loop").


This implementation requires the rangeiterobject to have intimate knowledge of the implementation of the PyLongObject, including copy-and-pasting some information that isn't otherwise exposed (max small int essentially).  At the very least that information would need to be exposed properly so rangeiterobject could use it correctly before this could be checked in.

It might be cleaner for longobject.c to expose a private "set this long object to this value" function.  It would fail if we can't do it safely: if the value was an interned (small) int, or if the long was of the wrong size (Py_SIZE(o) != 1).


Is this interesting enough to pursue?  I'm happy to drop it.

History
Date	User	Action	Args
2015-05-06 16:18:56	larry	set	recipients: + larry, rhettinger, pitrou, serhiy.storchaka
2015-05-06 16:18:56	larry	set	messageid: <1430929136.04.0.0158927713933.issue24138@psf.upfronthosting.co.za>
2015-05-06 16:18:55	larry	link	issue24138 messages
2015-05-06 16:18:55	larry	create