This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Using PyNumber_AsSsize_t in itertools.islice
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: MSeifert, Will Roberts, erik.bray, jdemeyer, mark.dickinson, rhettinger, serhiy.storchaka, smarnach
Priority: low Keywords:

Created on 2017-06-01 11:26 by MSeifert, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1918 merged python-dev, 2017-06-02 11:31
Messages (12)
msg294934 - (view) Author: Michael Seifert (MSeifert) * Date: 2017-06-01 11:26
In a question on StackOverflow (https://stackoverflow.com/questions/44302946/itertools-does-not-recognize-numpy-ints-as-valid-inputs-on-python-3-6) it was mentioned that numpys scalars cannot be used as index for `itertools.islice`. 

The reason for this is because the numbers are converted with `PyLong_AsSsize_t` (which requires a PyLongObject [or subclass]) instead of `PyNumber_AsSsize_t` (which only requires an `__index__` method).

I'm not sure if there are many use-cases where numpy scalars make sense as inputs for `islice` but it's definetly unexpected that `itertools.islice([1, 2, 3, 4, 5, 6], np.int32(3))` currently throws a `ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize.`
msg295000 - (view) Author: Will Roberts (Will Roberts) * Date: 2017-06-02 09:47
Note that this issue also seems to affect other methods in the itertools package, such as permutations.
msg295088 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-03 16:57
Adding support of more general int-like objects in islice() looks reasonable to me. I'm not sure about permutations(), but if make this change, use PyIndex_Check() instead of PyNumber_Check(), or don't use the special check at all, allowing PyNumber_AsSsize_t() to fail.

The __setstate__() methods shouldn't be changed. We know that the __reduce__() methods return exact ints, not general int-like objects.

When replace PyLong_AsSsize_t with PyNumber_AsSsize_t take into account that PyLong_AsSsize_t is atomic and thread-safe, while PyNumber_AsSsize_t can call Python code and release the GIL.
msg295173 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-06-05 08:10
[Michael Seifert]
> I'm not sure if there are many use-cases where numpy scalars
> make sense as inputs for `islice`

Likewise, I'm not sure there is any evidence of a use-case.  The incompatible relationship between numpy ints and PyLong_AsSsize_t() has been around for a long time and no one seems to have even noticed until now.

Also, you're right that the error message isn't very helpful unless you already understand that np.int32 isn't a subclass of int.

[Serhiy]
> Adding support of more general int-like objects in islice() 
>looks reasonable to me.

Conceptually, anything with an __index__ method might make sense in the context of slicing, so the proposal doesn't sound unreasonable on the face of it.

Practically though, there doesn't seem to be any evidence of a real use case here.  It is unclear whether allowing np.int32 would ever benefit anyone and whether dropping the current requirement for an explicit int() cast would encourage weird and inefficient code.

Either way, this should be categorized as a feature request.  CPython itself is passing tests and isn't breaking any promises.

[Will Roberts]
> Note that this issue also seems to affect other methods 
> in the itertools package, such as permutations.

If we do make some sort of change, it should be limited it to just islice().  To me, this only makes sense in the context of index arguments unless someone makes a broad and consequential executive decision that everything in Python that accepts an integer needs to also check for int-like objects as well.
msg295185 - (view) Author: Will Roberts (Will Roberts) * Date: 2017-06-05 12:40
Thanks for feedback, Serhiy and Raymond!  Github PR now has reverted changes except to the calls in islice_new; I am happy to squash if you would like.

Serhiy, this is my first time poking around in CPython code.  What are the potential consequences of making `itertools.islice` less atomic/thread-safe, and/or possibly releasing the GIL?  Are there any gotchas to watch out for in this patch specifically?  I've modelled my changes on the code in listobject.c [list_subscript], but I would love to hear if there's a better way to do things.

Raymond, I'd also be curious to learn about any code weirdness or inefficiency you have in mind.  I agree with you that there might not be a compelling use-case here.  The SO question looks to be a bit contrived; however, the interesting bits to me here are that the behaviour of numpy interacting with itertools has changed since py27, and also that the proposed workarounds/solutions seem ... inelegant?  Need a numpy integer be explicitly coerced to int in this context when the type implements an __index__ method?
msg295192 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-05 14:36
My note was not directly related to your patch. It was a warning about general replacement of PyLong_AsSsize_t with PyNumber_AsSsize_t. There is a code for which this is dangerous.
msg295327 - (view) Author: Sven Marnach (smarnach) Date: 2017-06-07 10:00
The current behaviour of islice() seems inconsistent with the rest of Python.  All other functions taking start, stop and step arguments like slice(), range() and itertools.count() do accept integer-like objects.  The code given as "roughly equivalent" in the documentation of islice() accepts integer-like objects, and so does regular list slicing.  In fact, the __index__() method was introduced in PEP 357 specifically for slicing.  In Python 2, islice() supported it as well.  I think the expectation that islice() in Python 3 also supports it is entirely reasonable, and I can't see any strong arguments for breaking that assumption.
msg295330 - (view) Author: Will Roberts (Will Roberts) * Date: 2017-06-07 11:43
Github PR adds simple test, as well as an entry in Misc/NEWS.
msg295379 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-06-08 06:03
New changeset 0ecdc525146ecec9d1549ebf59404c769637a512 by Raymond Hettinger (Will Roberts) in branch 'master':
bpo-30537: use PyNumber in itertools.islice instead of PyLong (#1918)
https://github.com/python/cpython/commit/0ecdc525146ecec9d1549ebf59404c769637a512
msg295383 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-06-08 06:41
Applied patch.  In theory, this was reasonable.  Since islice() deals with indices, checking __index__ seems reasonable.

On the other hand, stable high-performance code was changed without any known reasonable use cases, and no one will ever likely see a benefit. A cheap atomic operation was replaced with a more complex and less thread-safe chain of calls (I'm no longer even sure that the test cases cover all the possible code paths).

Users might be better off by not seeing an unhelpful error message when passing in a numpy.int32, or they might be worse-off by having lost a cue that they were writing inefficient code (which invisibly creates temporary integer objects on every call when presumably the whole reason for using numpy was a concern for efficiency).
msg309854 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2018-01-12 13:37
Just to note that this bug affects SageMath too: https://trac.sagemath.org/ticket/24528

Thanks for fixing!
msg309859 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2018-01-12 15:09
> Users might be better off by not seeing an unhelpful error message when passing in a numpy.int32, or they might be worse-off by having lost a cue that they were writing inefficient code (which invisibly creates temporary integer objects on every call when presumably the whole reason for using numpy was a concern for efficiency).

While it's true there are many Numpy users who don't understand what they're doing, any time they go into the Python interpreter they're losing the benefits of Numpy anyways (which are fast vectorized operations on arrays).
History
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74722
2018-01-12 15:09:24erik.braysetnosy: + erik.bray
messages: + msg309859
2018-01-12 13:37:04jdemeyersetnosy: + jdemeyer
messages: + msg309854
2017-08-05 03:02:30terry.reedylinkissue31097 superseder
2017-06-08 06:41:03rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg295383

stage: resolved
2017-06-08 06:03:06rhettingersetmessages: + msg295379
2017-06-07 11:43:40Will Robertssetmessages: + msg295330
2017-06-07 10:00:49smarnachsetnosy: + smarnach
messages: + msg295327
2017-06-05 14:36:41serhiy.storchakasetmessages: + msg295192
2017-06-05 12:40:26Will Robertssetmessages: + msg295185
2017-06-05 08:10:42rhettingersetpriority: normal -> low
type: behavior -> enhancement
messages: + msg295173

versions: - Python 3.5, Python 3.6
2017-06-03 16:57:23serhiy.storchakasetassignee: rhettinger

messages: + msg295088
nosy: + rhettinger, serhiy.storchaka
2017-06-02 11:31:07python-devsetpull_requests: + pull_request1997
2017-06-02 09:47:24Will Robertssetnosy: + Will Roberts
messages: + msg295000
2017-06-01 19:40:13mark.dickinsonsetnosy: + mark.dickinson
2017-06-01 11:26:09MSeifertcreate