This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: itertools.islice() goes over all the pre-initial elements even if the iterable can seek
Type: enhancement Stage:
Components: Extension Modules Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jcea, josh.r, r.david.murray, rhettinger
Priority: normal Keywords:

Created on 2014-06-13 04:02 by jcea, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg220417 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2014-06-13 04:02
If L is a big sequence and I do "itertool.islice(L, 1000000, None)", islice will go over a million elements before returning the first that I actually cared.

Since L is a sequence, islice could go directly to the element 1000000.

My program performance improved hugely replacing a "for i in L[p:]" for "for i in (L[p] for p in range(p, len(L)))".

Using itertools and doing "for i in itertools.islice(L, p, None)" is massively inefficient.
msg220435 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-13 12:56
Then don't use itertools for that case.  itertools is designed for working with *arbitrary* iterables, and arbitrary iterables are not seekable.

I'll leave it to Raymond to decide if it is worth making a special-case optimization :)
msg220520 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-06-14 01:47
I would think that allowing the special case optimization would be a good idea. As is, if you want to take slices of buffers without making copies, you can use memoryview and get O(1) views of a slice of the memory. But there's nothing built-in that's even remotely equivalent for large lists/tuples. If I have list (we'll call it mylist) with one million items, and I'd like to iterate/operate on the last 100,000, my choices are:

1. mylist[-100000:] # Copies 100,000 pointers; on x64, that's 800KB of memory copied, and 100,000 increfs and decrefs performed en masse, before I even iterate
2. itertools.islice(mylist, 900000, None) # Uses no additional memory, but performs a solid million increfs/decrefs as it iterates, 90% of which are unnecessary
3. Write a custom sequence view class that explicitly indexes to simulate a "listview" or a "tupleview" in a style similar to a memoryview. Minimal memory overhead, and it doesn't process stuff you don't care about, but you actually have to write the thing, and any pure Python implementation is going to add a lot of mathematical overhead to maintain a slice or range in parallel so you can index the wrapped sequence appropriately.

#3 is the only fully generalizable way to do this, but I see no reason not to make it possible to handle simple cases with islice. Either you write a custom iterator that uses higher level Python constructs to index on behalf of the user (slower, but generalizes to anything with a __len__ and __getitem__), and/or a high performance custom iterator that is basically just iterating a C array from PySequence_FAST_ITEMS; only works with lists/tuples, but would be blazing fast (alternatively, just let itertools.islice directly muck with the tuple/list iterator internals to fast forward them to the correct index, which reduces the need for extra code, at the expense of a tighter dependency between itertools and tuple/list internals).

If people are opposed to making islice do this sort of work, I may be forced to start considering a dedicated sequenceview class. Either that, or propose an extension to the iterator protocol to request fast-forwarding, where non-specialized iterators act like islice does now, and specialized iterators skip ahead directly. :-)
msg220522 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-06-14 02:06
I will admit, on further consideration, a dedicated sequenceview might work more nicely. islice's interface is limited by its generality; since it doesn't assume a length, you can't use a negative value for end.

Does anyone think a general sequenceview class would make sense (or a sequenceview function that returns either a general sequenceview or a memoryview, depending on what the underlying type is)? Might make an interesting project, whether or not islice is improved.
msg220524 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-06-14 02:18
> I'll leave it to Raymond to decide if it is worth 
> making a special-case optimization :)

Special cases aren't special enough ...  :-)

The itertools are specifically designed to work with general iterators, not just sequences.

> Does anyone think a general sequenceview class would make sense

You could post one on PyPi but I don't think there would be any uptake.  It is often easier to write a simple generator customized to your particular needs than it is to learn and remember a new class to handle an infrequent 
use case.
History
Date User Action Args
2022-04-11 14:58:04adminsetgithub: 65943
2014-06-14 02:18:16rhettingersetstatus: open -> closed
assignee: rhettinger
resolution: not a bug
messages: + msg220524
2014-06-14 02:06:23josh.rsetmessages: + msg220522
2014-06-14 01:47:43josh.rsetmessages: + msg220520
2014-06-13 12:56:55r.david.murraysetnosy: + rhettinger, r.david.murray
messages: + msg220435
2014-06-13 10:02:59josh.rsetnosy: + josh.r
2014-06-13 04:02:27jceacreate