Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator.subscript as a convenience for creating slices #68567

Closed
llllllllll mannequin opened this issue Jun 4, 2015 · 60 comments
Closed

Add operator.subscript as a convenience for creating slices #68567

llllllllll mannequin opened this issue Jun 4, 2015 · 60 comments
Labels
3.7 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@llllllllll
Copy link
Mannequin

llllllllll mannequin commented Jun 4, 2015

BPO 24379
Nosy @gvanrossum, @rhettinger, @mdickinson, @vstinner, @taleinat, @ned-deily, @stevendaprano, @bitdancer, @vadmium, @serhiy-storchaka, @1st1, @MojoVampire, @llllllllll, @ilevkivskyi, @Vgr255, @shoyer
Files
  • slice.patch
  • slicerepr.patch
  • operator_subscript.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_pyonly.patch
  • operator_subscript_norefleak.patch
  • operator_subscript_norefleak_v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-16.02:02:00.835>
    created_at = <Date 2015-06-04.06:07:04.227>
    labels = ['extension-modules', 'type-feature', '3.7']
    title = 'Add operator.subscript as a convenience for creating slices'
    updated_at = <Date 2018-07-20.00:08:42.343>
    user = 'https://github.com/llllllllll'

    bugs.python.org fields:

    activity = <Date 2018-07-20.00:08:42.343>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-16.02:02:00.835>
    closer = 'rhettinger'
    components = ['Extension Modules']
    creation = <Date 2015-06-04.06:07:04.227>
    creator = 'llllllllll'
    dependencies = []
    files = ['39614', '39615', '39775', '39792', '39902', '39904', '39906', '39907', '39913', '45471', '45473']
    hgrepos = []
    issue_num = 24379
    keywords = ['patch']
    message_count = 60.0
    messages = ['244801', '244802', '244804', '244805', '244811', '244819', '244820', '244821', '245070', '245072', '245082', '245131', '245133', '245134', '245657', '245665', '245666', '245702', '246618', '246624', '246625', '246631', '246634', '246635', '246637', '246638', '246752', '247286', '247528', '247530', '248703', '248705', '248706', '248708', '248710', '253291', '253894', '253899', '253902', '254031', '256895', '272773', '272774', '272863', '280456', '280639', '280648', '280661', '280692', '280699', '280700', '280703', '280706', '280707', '280716', '280721', '319507', '319695', '321966', '321972']
    nosy_count = 18.0
    nosy_names = ['gvanrossum', 'rhettinger', 'mark.dickinson', 'vstinner', 'taleinat', 'ned.deily', 'Arfrever', 'steven.daprano', 'r.david.murray', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'yselivanov', 'josh.r', 'llllllllll', 'levkivskyi', 'abarry', 'Stephan Hoyer']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue24379'
    versions = ['Python 3.7']

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 4, 2015

    I often find that when working with pandas and numpy I want to store slice objects in variables to pass around and re-use; however, the syntax for constructing a slice literal outside of an indexer is very different from the syntax used inside of a subscript. This patch proposes the following change:

    slice.literal
    

    This would be a singleton instance of a class that looks like:

    class sliceliteral(object):
        def __getitem__(self, key):
            return key

    The basic idea is to provide an alternative constructor to 'slice' that uses the subscript syntax. This allows people to write more understandable code.

    Consider the following examples:

    reverse = slice(None, None, -1)
    reverse = slice.literal[::-1]
    
    all_rows_first_col = slice(None), slice(0)
    all_rows_first_col = slice.literal[:, 0]
    
    first_row_all_cols_but_last = slice(0), slice(None, -1)
    first_row_all_cols_but_last = slice.literal[0, :-1]

    Again, this is not intended to make the code shorter, instead, it is designed to make it more clear what the slice object your are constructing looks like.

    Another feature of the new literal object is that it is not limited to just the creation of slice instances; instead, it is designed to mix slices and other types together. For example:

    >>> slice.literal[0]
    0
    >>> slice.literal[0, 1]
    (0, 1)
    >>> slice.literal[0, 1:]
    (0, slice(1, None, None)
    >>> slice.literal[:, ..., ::-1]
    (slice(None, None, None), Ellipsis, slice(None, None, -1)

    These examples show that sometimes the subscript notation is much more clear that the non-subscript notation.
    I believe that while this is trivial, it is very convinient to have on the slice type itself so that it is quickly available. This also prevents everyone from rolling their own version that is accesible in different ways (think Py_RETURN_NONE).
    Another reason that chose this aproach is that it requires no change to the syntax to support.

    There is a second change proposed here and that is to 'slice.__repr__'. This change makes the repr of a slice object match the new literal syntax to make it easier to read.

    >>> slice.literal[:]
    slice.literal[:]
    >>> slice.literal[1:]
    slice.literal[1:]
    >>> slice.literal[1:-1]
    slice.literal[1:-1]
    >>> slice.literal[:-1]
    slice.literal[:-1]
    >>> slice.literal[::-1]
    slice.literal[::-1]

    This change actually affects old behaviour so I am going to upload it as a seperate patch. I understand that the change to repr much be less desirable than the addition of 'slice.literal'

    @llllllllll llllllllll mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jun 4, 2015
    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 4, 2015

    Here is the patch that includes the updates to 'slice.__repr__'

    @rhettinger
    Copy link
    Contributor

    FWIW, I like this idea.

    @serhiy-storchaka
    Copy link
    Member

    Why not index the slice type itself? slice[1:2]

    Another feature of the new literal object is that it is not limited to just the creation of slice instances; instead, it is designed to mix slices and other types together.

    This looks as disadvantage.

    @stevendaprano
    Copy link
    Member

    I'm with Serhiy, I don't think we need a "literal", just make slice itself indexable:

    reverse = slice(None, None, -1)
    reverse = slice[::-1]

    The only question in my mind is what slice should do when given just a single index:

    slice[0]

    I suppose that should be a ValueError?

    @mdickinson
    Copy link
    Member

    For prior art, it's worth taking a look at NumPy, and in particular its s_ and index_exp functions:

    >>> import numpy as np
    >>> np.s_[1:2]
    slice(1, 2, None)
    >>> np.s_[0]
    0
    >>> np.s_[1:2, 3]
    (slice(1, 2, None), 3)

    @mdickinson
    Copy link
    Member

    (Correction: they're not functions, or even callables.)

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 4, 2015

    Why not index the slice type itself? slice[1:2]

    I originally considered this and I personally really like this syntax, but I was concerned with ambiguity with the typing module

    The only question in my mind is what slice should do when given just a single index

    I think some of the power of this concept comes from the fact that I can express a complicated indexer without worrying about how it desugars. I would personally prefer being able to have this return tuples and scalars so that the syntax is easier to explain.

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 9, 2015

    (This should probably be discussed on the Python Ideas mailing list...)

    I definitely like the idea of being able to construct slices etc. using "[]" syntax. I think this should be considered even if it is decided not to change the repr() of slices.

    An important note here is that this is about more than slices:

    $ python3
    Python 3.4.2 (default, Feb 23 2015, 21:16:28)
    [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class A:
    ...     def __getitem__(self, *args):
    ...         print(repr(args))
    ...
    >>> a = A()
    >>> a[0]
    (0,)
    >>> a[0:1]
    (slice(0, 1, None),)
    >>> a[0:1, ..., 1:2]
    ((slice(0, 1, None), Ellipsis, slice(1, 2, None)),)
    >>> a[0:1, 2]
    ((slice(0, 1, None), 2),)

    Indeed, Joe's suggested slice.literal supports all of this, but we can't just modify slice to handle all of these cases.

    What I'm missing is a way to use such an object to actually index/slice something. The only way I can currently think of is using a.__getitem__(), but that's quite ugly IMO.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 9, 2015

    What I'm missing is a way to use such an object to actually index/slice something

    Sorry, I am not sure I understand what you mean by this? You can pass a slice object, or a tuple of slices in subscript notation.

    >>> [1, 2, 3, 4][slice(2)]
    [1, 2]

    Also, I am not currently subscribed to python-ideas; what is the common practice for posting new threads?

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 9, 2015

    (see https://mail.python.org/mailman/listinfo/python-ideas)

    But for x = [1,2,3,4], how will x[y] work for all of the following values of y?

    y = slice.literal[0]
    y = slice.literal[1:2]
    y = slice.literal[0:1, ..., 3]

    NumPy's s_ "magic object" is a factory, returning objects of different types depending on the given input. If you want an actual slice.literal class, then you'll have to supply a way to convert it into an object usable for indexing/slicing, e.g.:

    y = slice.literal[0:1]
    [1,2,3,4][y.as_indexer]

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 10, 2015

    >>> slice.literal[0]
    0
    >>> y = slice.literal[1:2]
    slice(1, 2, None)
    >>> slice.literal[0:1, ..., 3]
    (slice(0, 1, None), Ellipsis, 3)

    The way this object works right now does not create instances of some inner class of slice, instead, indexing it returns the key without modification.

    @taleinat
    Copy link
    Contributor

    So, is this in any ways different than NumPy's s_?

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 10, 2015

    It is a singleton, does not accept the maketuple flag, and is written in C. I did not know about the s_ attribute of numpy before writing this; however, I still think that moving this object to slice improves code clarity (s_ is not a super clear name). I also think that this behaviour belongs on the slice object.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 22, 2015

    Based on some discussion on python-ideas, this is being renamed to operator.subscript. Here is the patch that makes the correct the changes to move this object into the operator module.

    @llllllllll llllllllll mannequin added extension-modules C modules in the Modules dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 22, 2015
    @llllllllll llllllllll mannequin changed the title slice.literal notation operator.subscript Jun 22, 2015
    @serhiy-storchaka
    Copy link
    Member

    Is C implementation needed?

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 23, 2015

    I just moved it over since I implemented it for slice originally, I can drop the C implementation.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jun 23, 2015

    I removed the C implementation.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jul 11, 2015

    ping: is there anything blocking this?

    @vadmium
    Copy link
    Member

    vadmium commented Jul 12, 2015

    Joe: Have you seen the comments on the code review? See the Review link at the top of the bug thread, or maybe check your spam.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jul 12, 2015

    Ah, I hadn't seen that, I will address those now, sorry about that.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jul 12, 2015

    updating with the slots change

    This also adds a ton of test cases

    @serhiy-storchaka
    Copy link
    Member

    The implementation itself LGTM (nice use of decorator). New feature should be documented. Needed changes in Doc/library/operator.rst and Doc/whatsnew/3.6.rst and the docstring for the subscript class. 'subscript' should be added to __all__.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Jul 12, 2015

    updating to address the docs and order of assertions

    @serhiy-storchaka
    Copy link
    Member

    I'm with Martin about tests. Only one assertion per special case is needed, no need to write exponential number of combinations.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 2, 2015

    On the mailing list, Antoine also suggested removing __slots__ completely as another workaround. I agree that it doesn’t seem important for a singleton, though in the review comments a couple people seemed to want __slots__.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 3, 2015

    Martin: I think the primary reason I suggested an empty __slots__ was to avoid people doing dumb stuff like attaching attributes to the singleton, and then getting annoyed if support for them was removed in some future release (e.g. if this ever moved to C for whatever reason), or expecting them to survive pickling, what have you. The memory savings (if any) would be negligible, it was just to avoid future compatibility concerns.

    Serhiy: If we add a stub slot, presumably it should be __stub, not __stub__, the former getting the private member name mangling protection while the latter would not (and the latter might give the impression it was some sort of special method/attribute when it was not).

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Dec 23, 2015

    and the latter might give the impression it was some sort of special method/attribute when it was not.

    Wouldn't adding this be special because it is specifically reserved by CPython as an implementation detail?

    Also, would adding this name (stub or __stub) be sufficient to get this patch merged again. I would not mind investigating the gc_head issue after but I view that as a separate issue which is only exercised by this patch.

    @Vgr255 Vgr255 mannequin changed the title operator.subscript Add operator.subscript as a convenience for creating slices Dec 23, 2015
    @serhiy-storchaka
    Copy link
    Member

    Raymond, adding a stub element to __slots__ fixes a leak. Is this enough to push the patch again?

    We should open a separate issue for leaks in objects with empty __slots__.

    @ilevkivskyi
    Copy link
    Member

    It looks like namedtuple suffers the same issue with empty __slots__:

    test_collections leaked [0, 0, 2, 0] references, sum=2

    @rhettinger
    Copy link
    Contributor

    We should open a separate issue for leaks in objects with empty __slots__

    The leak should be fixed first rather than introducing stub fields hack.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Nov 9, 2016

    bpo-28651 opened for the general problem with empty __slots__ and gc failures.

    @rhettinger
    Copy link
    Contributor

    When I first looked at this a year ago, I thought it seemed like a reasonable idea and might be useful. Since that time, I've haven't encountered any situations whether this would have improved my or someone else's code. And now, I can't even dream up any scenarios where operator.subscript[3:7:2] would be better than slice(3, 7, 2). Accordingly, I think we should re-evaluate whether there is any real benefit to be had by adding this to the standard library. Perhaps, we're better off without it.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 12, 2016
    @ilevkivskyi
    Copy link
    Member

    ilevkivskyi commented Nov 12, 2016

    I think it would be more useful for multidimensional slicing:

    >>> subscript[::-1, ..., ::-1]
    (slice(None, None, -1), Ellipsis, slice(None, None, -1))

    @stevendaprano
    Copy link
    Member

    On Sat, Nov 12, 2016 at 08:23:45AM +0000, Raymond Hettinger wrote:

    I can't even dream up any scenarios where
    operator.subscript[3:7:2] would be better than
    slice(3, 7, 2).

    For that specific example, I completely agree. But I think that in
    general slice[::-1] is better than either the current alternative
    slice(None, None, -1) or the proposed operator.subscript[::-1].

    And there's no comparison once you get to multi-dimensional slices:

        s = slice[::-1, 1:, 1::2]
        spam[s] + eggs[s]

    versus:

        s = slice(slice(None, None, -1), slice(1, None), slice(1, None, 2))
        spam[s] + eggs[s]

    Even simple examples look nice in slice syntax:

    slice[3:7:2]
    

    I had completely forgotten about this feature request, but
    coincidentally re-suggested it on the Python-Ideas mailing
    list earlier today:

    https://mail.python.org/pipermail/python-ideas/2016-November/043630.html

    The downside is that beginners who are still learning Python's
    syntax may try writing:

        slice(::-1)

    by mistake. Nevertheless, it seems to me that the advantage to more
    experienced developers to be able to read and write slice objects using
    slice format outweighs the temporary confusion to beginners. (I would
    expect that outside of the Numpy community, few beginners use the
    slice() object directly, so this would not often affect them.)

    @ilevkivskyi
    Copy link
    Member

    Here is a new patch that does not cause refleaks, I just replaced empty __slots__ with a __setattr__: it looks like __slots__ are not needed here to save memory (there is only a singleton instance), they were used just to create an immutable object.

    @gvanrossum
    Copy link
    Member

    So we now have the refleak fixed. Can we apply the patch? Or is it too late for 3.6?

    @ilevkivskyi
    Copy link
    Member

    The patch is small and it was initially applied before 3.6b1, I think we should ask Ned (added to nosy) if this is OK to apply the fixed version?

    @ilevkivskyi
    Copy link
    Member

    Serhiy, thank you for review! Here is a corrected patch.

    @serhiy-storchaka
    Copy link
    Member

    operator_subscript_norefleak_v2.patch LGTM. Waiting for Ned's decision about 3.6.

    @ned-deily
    Copy link
    Member

    I don't think this is appropriate for 3.6 now. We're a little more than 4 weeks from the final release - way past the feature code cut-off - and, from the comments here, there is no longer a total consensus that this feature should go back in at all after being pulled and largely forgotten about for a year.

    @ned-deily ned-deily removed their assignment Nov 13, 2016
    @ilevkivskyi
    Copy link
    Member

    Maybe then it makes sense to apply this to 3.7? It looks like most people are in favour of this (also on python-ideas), only Raymond is not sure/mildly against.

    @gvanrossum
    Copy link
    Member

    Actually I'm with Raymond -- I'm also not sure or mildly against (-0).
    Given how easy it is to do without and how cumbersome using the operator
    module is I don't see a great benefit. (IMO the operator module should be
    the measure of *last* resort -- it is not to become part of anybody's
    toolkit for common operators. But I seem to be a minority opinion here.)

    @taleinat
    Copy link
    Contributor

    So 3.8 then, or should this be closed?

    FWIW I'm -1. IMO this should be a code recipe somewhere, no need for it to be in the stdlib.

    @gvanrossum
    Copy link
    Member

    Let's close it. Just because someone spent a lot of effort on a patch we don't have to accept it.

    @shoyer
    Copy link
    Mannequin

    shoyer mannequin commented Jul 19, 2018

    Raymond, Tal and Guido -- do any of you work routinely with multi-dimensional arrays?

    In my experience as someone who uses Python everyday for numerical computing (and has been doing so for many years), the need for an operator like this comes up with some regularity. With multi-dimensional indexing, this allows lets you cut down quite a bit on boilerplate, e.g., compare

    subscript[:, 0, ::-1] vs (slice(None), 0, slice(None, None, -1))

    It's absolutely true that subscript is an easy four line recipe, and indeed a form of this recipe is already included in both NumPy and pandas. But I think this is a case where the lack of a common idiom is actively harmful. I do multi-dimensional indexing in using at least four different libraries (pandas, xarray, numpy and tensorflow), and it feels wrong to use a utility from numpy/pandas for other projects. I could write my own version of operator.subscript in each project (and yes, I've done so before), but for any individual use case it's less hassle to write things out the long way.

    In practice, the preferred way to write such long expressions seems to be to redundantly repeat indexing operations, e.g.,

    x[:, 0, ::-1]
    y[:, 0, ::-1]

    rather than

      index = subscript[:, 0, ::-1]
      x[index]
      y[index]

    This is definitely non-ideal from a readability perspective. It's no longer immediately clear that these arrays are being indexed in the same way, and any changes would need to be applied twice.

    @vstinner
    Copy link
    Member

    Raymond, Tal and Guido -- do any of you work routinely with multi-dimensional arrays?

    Hi Stephan Hoyer,

    IMHO a *closed* issue is not the most appropriate place to request Guido to change his mind. You may get more traction on python-ideas where you may find more supporters who need the operator.

    The question is not only if the operator makes sense (obviously, it does, for you), but if it "belongs" to the standard library. Is it popular enough? Is it consistent with other functions of the operator module? etc.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests