Message 220520 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	jcea, josh.r, r.david.murray, rhettinger
Date	2014-06-14.01:47:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1402710463.42.0.0241573595974.issue21744@psf.upfronthosting.co.za>
In-reply-to

Content
I would think that allowing the special case optimization would be a good idea. As is, if you want to take slices of buffers without making copies, you can use memoryview and get O(1) views of a slice of the memory. But there's nothing built-in that's even remotely equivalent for large lists/tuples. If I have list (we'll call it mylist) with one million items, and I'd like to iterate/operate on the last 100,000, my choices are: 1. mylist[-100000:] # Copies 100,000 pointers; on x64, that's 800KB of memory copied, and 100,000 increfs and decrefs performed en masse, before I even iterate 2. itertools.islice(mylist, 900000, None) # Uses no additional memory, but performs a solid million increfs/decrefs as it iterates, 90% of which are unnecessary 3. Write a custom sequence view class that explicitly indexes to simulate a "listview" or a "tupleview" in a style similar to a memoryview. Minimal memory overhead, and it doesn't process stuff you don't care about, but you actually have to write the thing, and any pure Python implementation is going to add a lot of mathematical overhead to maintain a slice or range in parallel so you can index the wrapped sequence appropriately. #3 is the only fully generalizable way to do this, but I see no reason not to make it possible to handle simple cases with islice. Either you write a custom iterator that uses higher level Python constructs to index on behalf of the user (slower, but generalizes to anything with a __len__ and __getitem__), and/or a high performance custom iterator that is basically just iterating a C array from PySequence_FAST_ITEMS; only works with lists/tuples, but would be blazing fast (alternatively, just let itertools.islice directly muck with the tuple/list iterator internals to fast forward them to the correct index, which reduces the need for extra code, at the expense of a tighter dependency between itertools and tuple/list internals). If people are opposed to making islice do this sort of work, I may be forced to start considering a dedicated sequenceview class. Either that, or propose an extension to the iterator protocol to request fast-forwarding, where non-specialized iterators act like islice does now, and specialized iterators skip ahead directly. :-)

I would think that allowing the special case optimization would be a good idea. As is, if you want to take slices of buffers without making copies, you can use memoryview and get O(1) views of a slice of the memory. But there's nothing built-in that's even remotely equivalent for large lists/tuples. If I have list (we'll call it mylist) with one million items, and I'd like to iterate/operate on the last 100,000, my choices are:

1. mylist[-100000:] # Copies 100,000 pointers; on x64, that's 800KB of memory copied, and 100,000 increfs and decrefs performed en masse, before I even iterate
2. itertools.islice(mylist, 900000, None) # Uses no additional memory, but performs a solid million increfs/decrefs as it iterates, 90% of which are unnecessary
3. Write a custom sequence view class that explicitly indexes to simulate a "listview" or a "tupleview" in a style similar to a memoryview. Minimal memory overhead, and it doesn't process stuff you don't care about, but you actually have to write the thing, and any pure Python implementation is going to add a lot of mathematical overhead to maintain a slice or range in parallel so you can index the wrapped sequence appropriately.

#3 is the only fully generalizable way to do this, but I see no reason not to make it possible to handle simple cases with islice. Either you write a custom iterator that uses higher level Python constructs to index on behalf of the user (slower, but generalizes to anything with a __len__ and __getitem__), and/or a high performance custom iterator that is basically just iterating a C array from PySequence_FAST_ITEMS; only works with lists/tuples, but would be blazing fast (alternatively, just let itertools.islice directly muck with the tuple/list iterator internals to fast forward them to the correct index, which reduces the need for extra code, at the expense of a tighter dependency between itertools and tuple/list internals).

If people are opposed to making islice do this sort of work, I may be forced to start considering a dedicated sequenceview class. Either that, or propose an extension to the iterator protocol to request fast-forwarding, where non-specialized iterators act like islice does now, and specialized iterators skip ahead directly. :-)

History
Date	User	Action	Args
2014-06-14 01:47:43	josh.r	set	recipients: + josh.r, rhettinger, jcea, r.david.murray
2014-06-14 01:47:43	josh.r	set	messageid: <1402710463.42.0.0241573595974.issue21744@psf.upfronthosting.co.za>
2014-06-14 01:47:43	josh.r	link	issue21744 messages
2014-06-14 01:47:41	josh.r	create