classification
Title: deque.rotate() could be much faster
Type: performance Stage: needs patch
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jcea, jcon, python-dev, rhettinger, serhiy.storchaka, sfllaw
Priority: low Keywords: patch

Created on 2012-11-03 19:31 by sfllaw, last changed 2013-02-02 20:27 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
rotate.diff rhettinger, 2013-01-12 03:06 Faster rotate review
Messages (17)
msg174679 - (view) Author: Simon Law (sfllaw) * Date: 2012-11-03 19:31
If you look at the implementation of deque.rotate(), it does the equivalent of deque.append(deque.popleft()) or deque.appendleft(deque.pop()).

Unfortunately, for larger rotations, the pop() and append() calls just do too much work. Since the documentation recommends using rotate() to do slicing-style operations, this could get seriously slow.

deque.rotate() could just touch up the internal pointers and use memmove() to realign the data.

Benchmarks, of course, would have to be written to make sure this is a win.
msg179739 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-11 22:43
The easiest change would essentially involve inlining the code for append/appendleft and then removing unneeded steps (the state update, the INCREF/DECREF of item, and the TRIM() test).  

Of these, the INCREF/DECREF is likely to be a nice win because we can rotate without visiting all the underlying objects (each of which would likely be a cache-line miss).
msg179746 - (view) Author: John O'Connor (jcon) Date: 2013-01-11 23:41
Looking at the implementation (rather quickly)[1]. I'm wondering if there is a reason why the appendleft(pop()) loop is required. It would seem the best way would be to determine the new head link, make the previous link the new end link and concatenate the two chains (by just swapping some pointers).
        

[1]. http://hg.python.org/cpython/file/a4292889e942/Modules/_collectionsmodule.c#l429
msg179773 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-12 06:30
New changeset 9fb793b60e79 by Raymond Hettinger in branch 'default':
Issue #16398: Optimize deque.rotate()
http://hg.python.org/cpython/rev/9fb793b60e79
msg179774 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-12 06:32
John, that wouldn't work because the blocks are fixed length and the endmost blocks are typically only partially filled, so all the data elements have to shift positions within their blocks.
msg179776 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-12 08:05
New changeset 0d81333bde78 by Raymond Hettinger in branch '2.7':
Issue #16398: Optimize deque.rotate()
http://hg.python.org/cpython/rev/0d81333bde78
msg179802 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2013-01-12 14:39
I am OK with this patch being applied to 2.7, but I wonder why. This is not a bugfix... :-)
msg179847 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-13 08:12
Why only 2.7 and 3.4?
msg179896 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-01-13 21:35
Jesús, I backported this to 2.7 because it was affecting intended usability of multiple parts of the API.  The current code had the egregious and unintended side-effect of touching every data element during a rotate -- this resulted in a huge number of unnecessary cache line misses and was blowing useful data out of cache.

IMO, a performance bug is different from a micro-optimization, especially if it is affecting the intended uses for part of an API.

Serhiy, you're welcome to backport to 3.3 if you desire.
msg181189 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-02 17:56
New changeset f7b01daffe01 by Raymond Hettinger in branch 'default':
Issue 16398: Use memcpy() in deque.rotate().
http://hg.python.org/cpython/rev/f7b01daffe01
msg181191 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-02 18:23
New changeset cb5cde9e5ac5 by Raymond Hettinger in branch '2.7':
Issue 16398: Use memcpy() in deque.rotate().
http://hg.python.org/cpython/rev/cb5cde9e5ac5
msg181194 - (view) Author: Simon Law (sfllaw) * Date: 2013-02-02 18:40
Raymond, looking at your patch, can we assert that deque->leftblock is never equal to deque->rightblock? If not, you need to use memmove() instead of memcpy(), which is unsafe for overlapping arrays.

It is not clear to me that this invariant is true. At the very least, an assertion should be there to protect future refactorings.
msg181196 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-02-02 18:44
Even when leftblock == rightblock, a memcpy() will work fine.  When the block extends off the end, it *always* creates a newblock and never wraps around to the current block.  Data doesn't get overwritten in any scenario.
msg181197 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-02-02 18:46
Also note, len>=1 and |n| < len//2, so even within a single block, memcpy() doesn't overwrite data.
msg181198 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-02 18:51
It is possible to use only one new block (or even none). It should a little speed up rotate(). I will open a new issue when prepare a patch.
msg181204 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-02 19:24
New changeset efb8d80af320 by Raymond Hettinger in branch 'default':
Issue 16398:  Add assertions to show why memcmp is safe.
http://hg.python.org/cpython/rev/efb8d80af320
msg181214 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-02 20:27
New changeset 12ef5a4bba63 by Raymond Hettinger in branch 'default':
Issue 16398: One more assertion for good measure.
http://hg.python.org/cpython/rev/12ef5a4bba63
History
Date User Action Args
2013-02-02 20:27:02python-devsetmessages: + msg181214
2013-02-02 20:19:16rhettingersetmessages: - msg181206
2013-02-02 19:57:13rhettingersetassignee: rhettinger
2013-02-02 19:27:46rhettingersetmessages: + msg181206
2013-02-02 19:24:53python-devsetmessages: + msg181204
2013-02-02 18:51:11serhiy.storchakasetmessages: + msg181198
2013-02-02 18:46:30rhettingersetmessages: + msg181197
2013-02-02 18:44:42rhettingersetmessages: + msg181196
2013-02-02 18:40:27sfllawsetmessages: + msg181194
2013-02-02 18:23:45python-devsetmessages: + msg181191
2013-02-02 17:56:20python-devsetmessages: + msg181189
2013-01-13 21:35:24rhettingersetmessages: + msg179896
2013-01-13 08:12:33serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg179847
2013-01-12 14:39:55jceasetmessages: + msg179802
2013-01-12 08:05:16python-devsetmessages: + msg179776
2013-01-12 06:32:28rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg179774
2013-01-12 06:30:13python-devsetnosy: + python-dev
messages: + msg179773
2013-01-12 03:06:29rhettingersetfiles: + rotate.diff
keywords: + patch
2013-01-11 23:41:52jconsetmessages: + msg179746
2013-01-11 22:44:03rhettingersetmessages: - msg179666
2013-01-11 22:43:46rhettingersetmessages: + msg179739
2013-01-11 11:02:19rhettingersetpriority: normal -> low

messages: + msg179666
2013-01-09 23:06:02jconsetnosy: + jcon
2012-11-03 19:43:04jceasetnosy: + jcea
2012-11-03 19:42:26serhiy.storchakasetnosy: + rhettinger
2012-11-03 19:42:03serhiy.storchakasetstage: needs patch
components: + Extension Modules, - Library (Lib)
versions: + Python 3.4, - Python 2.7, Python 3.2, Python 3.3
2012-11-03 19:31:06sfllawcreate