New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add operator.subscript as a convenience for creating slices #68567
Comments
I often find that when working with pandas and numpy I want to store slice objects in variables to pass around and re-use; however, the syntax for constructing a slice literal outside of an indexer is very different from the syntax used inside of a subscript. This patch proposes the following change:
This would be a singleton instance of a class that looks like: class sliceliteral(object):
def __getitem__(self, key):
return key The basic idea is to provide an alternative constructor to 'slice' that uses the subscript syntax. This allows people to write more understandable code. Consider the following examples: reverse = slice(None, None, -1)
reverse = slice.literal[::-1]
all_rows_first_col = slice(None), slice(0)
all_rows_first_col = slice.literal[:, 0]
first_row_all_cols_but_last = slice(0), slice(None, -1)
first_row_all_cols_but_last = slice.literal[0, :-1] Again, this is not intended to make the code shorter, instead, it is designed to make it more clear what the slice object your are constructing looks like. Another feature of the new >>> slice.literal[0]
0
>>> slice.literal[0, 1]
(0, 1)
>>> slice.literal[0, 1:]
(0, slice(1, None, None)
>>> slice.literal[:, ..., ::-1]
(slice(None, None, None), Ellipsis, slice(None, None, -1) These examples show that sometimes the subscript notation is much more clear that the non-subscript notation. There is a second change proposed here and that is to 'slice.__repr__'. This change makes the repr of a slice object match the new literal syntax to make it easier to read. >>> slice.literal[:]
slice.literal[:]
>>> slice.literal[1:]
slice.literal[1:]
>>> slice.literal[1:-1]
slice.literal[1:-1]
>>> slice.literal[:-1]
slice.literal[:-1]
>>> slice.literal[::-1]
slice.literal[::-1] This change actually affects old behaviour so I am going to upload it as a seperate patch. I understand that the change to repr much be less desirable than the addition of 'slice.literal' |
Here is the patch that includes the updates to 'slice.__repr__' |
FWIW, I like this idea. |
Why not index the slice type itself? slice[1:2]
This looks as disadvantage. |
I'm with Serhiy, I don't think we need a "literal", just make slice itself indexable: reverse = slice(None, None, -1)
reverse = slice[::-1] The only question in my mind is what slice should do when given just a single index: slice[0] I suppose that should be a ValueError? |
For prior art, it's worth taking a look at NumPy, and in particular its >>> import numpy as np
>>> np.s_[1:2]
slice(1, 2, None)
>>> np.s_[0]
0
>>> np.s_[1:2, 3]
(slice(1, 2, None), 3) |
(Correction: they're not functions, or even callables.) |
I originally considered this and I personally really like this syntax, but I was concerned with ambiguity with the typing module
I think some of the power of this concept comes from the fact that I can express a complicated indexer without worrying about how it desugars. I would personally prefer being able to have this return tuples and scalars so that the syntax is easier to explain. |
(This should probably be discussed on the Python Ideas mailing list...) I definitely like the idea of being able to construct slices etc. using "[]" syntax. I think this should be considered even if it is decided not to change the repr() of slices. An important note here is that this is about more than slices: $ python3
Python 3.4.2 (default, Feb 23 2015, 21:16:28)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
... def __getitem__(self, *args):
... print(repr(args))
...
>>> a = A()
>>> a[0]
(0,)
>>> a[0:1]
(slice(0, 1, None),)
>>> a[0:1, ..., 1:2]
((slice(0, 1, None), Ellipsis, slice(1, 2, None)),)
>>> a[0:1, 2]
((slice(0, 1, None), 2),) Indeed, Joe's suggested slice.literal supports all of this, but we can't just modify slice to handle all of these cases. What I'm missing is a way to use such an object to actually index/slice something. The only way I can currently think of is using a.__getitem__(), but that's quite ugly IMO. |
Sorry, I am not sure I understand what you mean by this? You can pass a slice object, or a tuple of slices in subscript notation. >>> [1, 2, 3, 4][slice(2)]
[1, 2] Also, I am not currently subscribed to python-ideas; what is the common practice for posting new threads? |
(see https://mail.python.org/mailman/listinfo/python-ideas) But for x = [1,2,3,4], how will x[y] work for all of the following values of y? y = slice.literal[0]
y = slice.literal[1:2]
y = slice.literal[0:1, ..., 3] NumPy's s_ "magic object" is a factory, returning objects of different types depending on the given input. If you want an actual slice.literal class, then you'll have to supply a way to convert it into an object usable for indexing/slicing, e.g.: y = slice.literal[0:1]
[1,2,3,4][y.as_indexer] |
>>> slice.literal[0]
0
>>> y = slice.literal[1:2]
slice(1, 2, None)
>>> slice.literal[0:1, ..., 3]
(slice(0, 1, None), Ellipsis, 3) The way this object works right now does not create instances of some inner class of slice, instead, indexing it returns the key without modification. |
So, is this in any ways different than NumPy's s_? |
It is a singleton, does not accept the |
Based on some discussion on python-ideas, this is being renamed to operator.subscript. Here is the patch that makes the correct the changes to move this object into the operator module. |
Is C implementation needed? |
I just moved it over since I implemented it for slice originally, I can drop the C implementation. |
I removed the C implementation. |
ping: is there anything blocking this? |
Joe: Have you seen the comments on the code review? See the Review link at the top of the bug thread, or maybe check your spam. |
Ah, I hadn't seen that, I will address those now, sorry about that. |
updating with the slots change This also adds a ton of test cases |
The implementation itself LGTM (nice use of decorator). New feature should be documented. Needed changes in Doc/library/operator.rst and Doc/whatsnew/3.6.rst and the docstring for the subscript class. 'subscript' should be added to __all__. |
updating to address the docs and order of assertions |
I'm with Martin about tests. Only one assertion per special case is needed, no need to write exponential number of combinations. |
On the mailing list, Antoine also suggested removing __slots__ completely as another workaround. I agree that it doesn’t seem important for a singleton, though in the review comments a couple people seemed to want __slots__. |
Martin: I think the primary reason I suggested an empty __slots__ was to avoid people doing dumb stuff like attaching attributes to the singleton, and then getting annoyed if support for them was removed in some future release (e.g. if this ever moved to C for whatever reason), or expecting them to survive pickling, what have you. The memory savings (if any) would be negligible, it was just to avoid future compatibility concerns. Serhiy: If we add a stub slot, presumably it should be __stub, not __stub__, the former getting the private member name mangling protection while the latter would not (and the latter might give the impression it was some sort of special method/attribute when it was not). |
Wouldn't adding this be special because it is specifically reserved by CPython as an implementation detail? Also, would adding this name (stub or __stub) be sufficient to get this patch merged again. I would not mind investigating the gc_head issue after but I view that as a separate issue which is only exercised by this patch. |
Raymond, adding a stub element to __slots__ fixes a leak. Is this enough to push the patch again? We should open a separate issue for leaks in objects with empty __slots__. |
It looks like namedtuple suffers the same issue with empty __slots__: test_collections leaked [0, 0, 2, 0] references, sum=2 |
The leak should be fixed first rather than introducing stub fields hack. |
bpo-28651 opened for the general problem with empty __slots__ and gc failures. |
When I first looked at this a year ago, I thought it seemed like a reasonable idea and might be useful. Since that time, I've haven't encountered any situations whether this would have improved my or someone else's code. And now, I can't even dream up any scenarios where |
I think it would be more useful for multidimensional slicing: >>> subscript[::-1, ..., ::-1]
(slice(None, None, -1), Ellipsis, slice(None, None, -1)) |
On Sat, Nov 12, 2016 at 08:23:45AM +0000, Raymond Hettinger wrote:
For that specific example, I completely agree. But I think that in And there's no comparison once you get to multi-dimensional slices: s = slice[::-1, 1:, 1::2]
spam[s] + eggs[s] versus: s = slice(slice(None, None, -1), slice(1, None), slice(1, None, 2))
spam[s] + eggs[s] Even simple examples look nice in slice syntax:
I had completely forgotten about this feature request, but https://mail.python.org/pipermail/python-ideas/2016-November/043630.html The downside is that beginners who are still learning Python's slice(::-1) by mistake. Nevertheless, it seems to me that the advantage to more |
Here is a new patch that does not cause refleaks, I just replaced empty __slots__ with a __setattr__: it looks like __slots__ are not needed here to save memory (there is only a singleton instance), they were used just to create an immutable object. |
So we now have the refleak fixed. Can we apply the patch? Or is it too late for 3.6? |
The patch is small and it was initially applied before 3.6b1, I think we should ask Ned (added to nosy) if this is OK to apply the fixed version? |
Serhiy, thank you for review! Here is a corrected patch. |
operator_subscript_norefleak_v2.patch LGTM. Waiting for Ned's decision about 3.6. |
I don't think this is appropriate for 3.6 now. We're a little more than 4 weeks from the final release - way past the feature code cut-off - and, from the comments here, there is no longer a total consensus that this feature should go back in at all after being pulled and largely forgotten about for a year. |
Maybe then it makes sense to apply this to 3.7? It looks like most people are in favour of this (also on python-ideas), only Raymond is not sure/mildly against. |
Actually I'm with Raymond -- I'm also not sure or mildly against (-0). |
So 3.8 then, or should this be closed? FWIW I'm -1. IMO this should be a code recipe somewhere, no need for it to be in the stdlib. |
Let's close it. Just because someone spent a lot of effort on a patch we don't have to accept it. |
Raymond, Tal and Guido -- do any of you work routinely with multi-dimensional arrays? In my experience as someone who uses Python everyday for numerical computing (and has been doing so for many years), the need for an operator like this comes up with some regularity. With multi-dimensional indexing, this allows lets you cut down quite a bit on boilerplate, e.g., compare subscript[:, 0, ::-1] vs (slice(None), 0, slice(None, None, -1)) It's absolutely true that subscript is an easy four line recipe, and indeed a form of this recipe is already included in both NumPy and pandas. But I think this is a case where the lack of a common idiom is actively harmful. I do multi-dimensional indexing in using at least four different libraries (pandas, xarray, numpy and tensorflow), and it feels wrong to use a utility from numpy/pandas for other projects. I could write my own version of operator.subscript in each project (and yes, I've done so before), but for any individual use case it's less hassle to write things out the long way. In practice, the preferred way to write such long expressions seems to be to redundantly repeat indexing operations, e.g., x[:, 0, ::-1] rather than index = subscript[:, 0, ::-1]
x[index]
y[index] This is definitely non-ideal from a readability perspective. It's no longer immediately clear that these arrays are being indexed in the same way, and any changes would need to be applied twice. |
Hi Stephan Hoyer, IMHO a *closed* issue is not the most appropriate place to request Guido to change his mind. You may get more traction on python-ideas where you may find more supporters who need the operator. The question is not only if the operator makes sense (obviously, it does, for you), but if it "belongs" to the standard library. Is it popular enough? Is it consistent with other functions of the operator module? etc. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: