Title: islice doesn't accept large stop values
Type: enhancement Stage: needs patch
Components: Extension Modules Versions: Python 3.5
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: AlokSinghal, loewis, rhettinger, terry.reedy, thomasguest
Priority: normal Keywords: patch

Created on 2009-06-18 07:14 by thomasguest, last changed 2014-06-23 05:39 by loewis. This issue is now closed.

File name Uploaded Description Edit
islice_large_values.patch AlokSinghal, 2014-04-19 00:45 patch to support large values in islice review
islice_large_values-2.patch AlokSinghal, 2014-04-21 23:11 patch to support large values in islice review
islice_large_values-3.patch AlokSinghal, 2014-04-22 13:54 review
islice_large_values-4.patch AlokSinghal, 2014-05-10 07:29 review
Messages (20)
msg89494 - (view) Author: Thomas Guest (thomasguest) Date: 2009-06-18 07:14
Python 3.0 (r30:67503, Jan  7 2009, 16:22:01) 
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin

>>> from itertools import islice, count
>>> islice(count(), (1<<31) - 1)
<itertools.islice object at 0x63a0c0>
>>> islice(count(), (1<<31))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Stop argument for islice() must be a non-negative integer or
msg89646 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-06-23 21:28
Clarified the error message in r73535.

Leaving open as a feature request to support arbitrarily large indices.
msg117000 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-20 23:11
Unassigning.  If someone is interested creating a patch, I think it is a reasonable request.
msg120456 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-11-04 22:11
In 3.1.2, range handles large numbers.
>>> list(range(1000000000, 50000000000, 10000000000))
[1000000000, 11000000000, 21000000000, 31000000000, 41000000000]
# those are all billions

This means that the 'equivalent' code in the doc will work.
msg216558 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-04-16 19:14
I started working on this bug as a part of PyCon US 2014 sprints.  Should the bugfix include a fast path (basically the current implementation) for when the values involved can fit in an int, and a slow path for larger values?  Or should the bugfix just have one path which works for all the cases (using PyObject * for "next", "stop" etc.)?
msg216564 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-04-16 19:45
Yes, you should add a fastpath and a slow path.  Take a look at itertools.count() or builtins.enumerate() to see how to do it.

This is a bit tricky to switch between modes, so you will need a thorough set of test cases.
msg216568 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-04-16 19:49
OK.  I have written the "slow path" version and tested it a bit.  I will add the code to switch between the paths and also add test cases as well.  Thanks!
msg216822 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-04-19 00:45
Here's a proposed patch.  I need more tests for large values, but all the tests I could think of take a long time to get to a long value.  I added some tests that don't take much time but work correctly for long values.  If anyone has any ideas for some other tests, please let me know.
msg216974 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-04-21 23:11
This updated patch has support for starting in fast mode until the next count would result in overflow in Py_ssize_t.  The first patch started in slow mode as soon as any of 'start', 'stop', or 'step' was outside of the range.  With this patch, we start in fast mode if possible and then transition to slow mode when needed.

I also tested this patch for correctness for the following cases:

- starting in slow mode,
- transition from fast -> slow,
- pickle/unpickle

I did this by temporarily changing the code twice:

- to always use fast mode, and
- pretending that overflow occurs at value 5 instead of PY_SSIZE_T_MAX.
msg216991 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-04-22 06:47
The first patch was cleaner.  I don't think it is necessary to start-fast and switch-to-slow in the case where not all of the arguments are in the normal range.
msg217003 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-04-22 13:54
OK.  Here is the first patch with a couple of bug fixes for "slow mode".
msg218211 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-05-10 07:29
Uploading another patch which is the same as the last patch but this one applies cleanly after the latest islice changes for #21321.
msg218234 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-05-10 19:44
Alok, have you signed a contributor agreement?
msg218268 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-05-11 14:12
Yes, I signed it a few days ago.
msg221227 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-06-22 07:45
Alok, overall the patch looks pretty good and you've done great work on it.

However, in working through its details, I find myself having major misgivings about doubling the size and complexity of the code for something that may not be ever benefit any real code.

Terry noted that range() supports values bigger than the word size but the needs there are much different.  Programs can reasonably use ranges with large start points, but an islice() call would have to iterate over *start* values before it begins returning any usable values:

  list(range(sys.maxsize+10, sys.maxsize+20))  # maybe a good idea
  list(islice(count(), sys.maxsize + 10, sys.maxsize + 20))  # probably not a good idea

When we finally get 64-bit everywhere (not there yet), I think the code in this patch would never get exercised.  Even in the 32-bit world, islicing over 2**32 inputs doesn't seem like a great idea.
msg221230 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-06-22 08:17
I'd find it sad if we would, after 5 years of asking for contributions, and after somebody actually contributing, now declare that we really don't want a contribution.
msg221235 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-06-22 08:50
Martin, "finding it sad" doesn't really help much here.

We *can* put the patch in.  Alok devoted a good chunk of time to creating the patch and I've already devoted a good chunk of time to reviewing it.  

However, in so doing I became concerned that it wasn't the right thing to do.  The code size and complexity is much bigger than I expected (as compared to what I had to do for itertools.count for example).  The use case is much weaker (because unlike range() and count() we don't have an arbitrary start point).  This thought surfaced when reading Alok's notes on the difficulty of testing this patch in a reasonable amount of time.

Besides feeling sad, what is your recommendation?  Would you like me to finish the review and check it in to make everyone feel good?
msg221262 - (view) Author: Alok Singhal (AlokSinghal) * Date: 2014-06-22 15:48
Hi Raymond, Martin,

I won't feel bad about this patch not making it into the final Python distribution.  I worked on this bug because it was the only "core C" bug at PyCon US sprints for CPython.  I learned a bit more about CPython while working on it and I don't consider the time I spent on this bug as wasted.

Other than symmetry arguments, I can't think of any other argument for islice to accept large values.

  a = []
  a[sys.maxsize+2:] # gives: []
  islice(a, None, sys.maxsize+2) # raises exception

But because islice has to go through the elements of the iterable from the current value to "start", while the first example doesn't, I don't think the symmetry argument is that strong here.

So, I think it's fine if we close this bug without accepting this patch.

Thanks for your time in reviewing it!
msg221288 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-06-22 18:04
All contributions are subject to final commit review. I looked at the patch and it is a *lot* of code for little benefit. I think the better solution would be an informative error message: "Currently, islice arguments must be less than {} on {}-bit systems".format(n, k).

Since I posted nearly 4 years ago, I have become more aware of the important differences between 3.x range as a sequence class whose instances are non-iterator *(re)iterables* and count as an iterator class whose instances are one-time *iterators*. To me, arbitrarily large indices now seem more appropriate for virtual sequences that can do O[1] indexing rather than iterators where indexing is O[n].

A recent proposal on python-ideas, which as I remember amounted to enhancing count to be more like range, tripped over this difference. I suggested that a new infinite sequence class Count (like range but without necessarily having a stop value) was a more sensible idea for what the OP wanted to do.
msg221344 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-06-23 05:39
Well, as Alok has indicated that he can understand this not getting accepted, I'm now fine with that, too. 

Terry: Raymond had already adjusted the error message five years ago.

So closing this as "won't fix".
Date User Action Args
2014-06-23 05:39:16loewissetstatus: open -> closed
resolution: wont fix
messages: + msg221344
2014-06-22 18:04:06terry.reedysetmessages: + msg221288
2014-06-22 15:48:03AlokSinghalsetmessages: + msg221262
2014-06-22 08:50:58rhettingersetmessages: + msg221235
2014-06-22 08:17:24loewissetnosy: + loewis
messages: + msg221230
2014-06-22 07:45:51rhettingersetmessages: + msg221227
2014-05-11 14:12:59AlokSinghalsetmessages: + msg218268
2014-05-10 19:44:47rhettingersetmessages: + msg218234
2014-05-10 07:29:43AlokSinghalsetfiles: + islice_large_values-4.patch

messages: + msg218211
2014-04-23 06:24:00rhettingersetassignee: rhettinger
2014-04-22 13:54:16AlokSinghalsetfiles: + islice_large_values-3.patch

messages: + msg217003
2014-04-22 06:47:14rhettingersetmessages: + msg216991
2014-04-21 23:11:25AlokSinghalsetfiles: + islice_large_values-2.patch

messages: + msg216974
2014-04-19 00:45:55AlokSinghalsetfiles: + islice_large_values.patch
keywords: + patch
messages: + msg216822
2014-04-16 19:49:09AlokSinghalsetmessages: + msg216568
2014-04-16 19:45:38rhettingersetmessages: + msg216564
versions: + Python 3.5, - Python 3.2
2014-04-16 19:14:04AlokSinghalsetnosy: + AlokSinghal
messages: + msg216558
2010-11-04 22:11:25terry.reedysetnosy: + terry.reedy
messages: + msg120456
2010-09-20 23:11:00rhettingersetpriority: low -> normal
versions: - Python 2.7
messages: + msg117000

assignee: rhettinger -> (no value)
stage: needs patch
2009-06-23 21:28:31rhettingersettype: behavior -> enhancement
messages: + msg89646
components: + Extension Modules
versions: + Python 2.7, Python 3.2, - Python 3.0
2009-06-18 08:01:42rhettingersetpriority: low
assignee: rhettinger

nosy: + rhettinger
2009-06-18 07:14:22thomasguestcreate