Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start and stop parameters to the Sequence.index() ABC mixin method #67275

Closed
rhettinger opened this issue Dec 18, 2014 · 17 comments
Closed
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 23086
Nosy @rhettinger, @ssbr, @serhiy-storchaka, @MojoVampire
Files
  • issue23086.patch: Patch with tests.
  • issue23086.2.patch: Updated patch with test.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2015-05-23.02:31:23.191>
    created_at = <Date 2014-12-18.20:41:46.585>
    labels = ['type-feature', 'library']
    title = 'Add start and stop parameters to the Sequence.index() ABC mixin method'
    updated_at = <Date 2015-05-23.02:31:23.190>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2015-05-23.02:31:23.190>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2015-05-23.02:31:23.191>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2014-12-18.20:41:46.585>
    creator = 'rhettinger'
    dependencies = []
    files = ['37607', '37631']
    hgrepos = []
    issue_num = 23086
    keywords = ['patch']
    message_count = 17.0
    messages = ['232904', '233485', '233564', '233589', '233590', '233635', '233684', '233693', '233695', '233698', '233722', '233724', '233736', '233761', '233791', '243880', '243881']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'Devin Jeanpierre', 'python-dev', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23086'
    versions = ['Python 3.5']

    @rhettinger
    Copy link
    Contributor Author

    Currently, the Sequence ABC doesn't support start and stop arguments for the index() method which limits its usefulness in doing repeated searches (iterating over a target value) and which limits it substitutablity for various concrete sequences such as tuples, lists, strings, bytes, bytearrays, etc.

    >>> help(Sequence.index)
    Help on method index in module _abcoll:

    index(self, value) unbound _abcoll.Sequence method
    S.index(value) -> integer -- return first index of value.
    Raises ValueError if the value is not present.

    >>> help(list.index)
    Help on method_descriptor:
    index(...)
        L.index(value, [start, [stop]]) -> integer -- return first index of value.
        Raises ValueError if the value is not present.
    >>> help(str.index)
    Help on method_descriptor:
    index(...)
        S.index(sub [,start [,end]]) -> int
        
        Like S.find() but raise ValueError when the substring is not found.
    >>> help(tuple.index)
    Help on method_descriptor:
    index(...)
        T.index(value, [start, [stop]]) -> integer -- return first index of value.
        Raises ValueError if the value is not present.
    >>> help(bytes.index)
    Help on method_descriptor:
    index(...)
        S.index(sub [,start [,end]]) -> int
        
        Like S.find() but raise ValueError when the substring is not found.
    >>> help(bytearray.index)
    Help on method_descriptor:
    index(...)
        B.index(sub [,start [,end]]) -> int
        
        Like B.find() but raise ValueError when the subsection is not found.

    @rhettinger rhettinger self-assigned this Dec 18, 2014
    @rhettinger rhettinger added the stdlib Python modules in the Lib dir label Dec 18, 2014
    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 5, 2015

    A wild patch appears!

    Test is included, I'm unhappy with it, because it uses one test method to test all of Sequence, but that's what the test suite does for MutableSequence.

    @rhettinger
    Copy link
    Contributor Author

    This test looks like it may have been a typo:

    self.assertEqual(seq.index('a'), 0, 1)

    Also, it would be nice to investigate the differences with list.index() and str.index() for the corner cases. Something along these lines:

        # Compare Sequence.index() behavior to list.index() behavior                        
        listseq = list('abracadabra')
        seqseq = SequenceSubclass(listseq)
        for start in range(-3, len(listseq) + 3):
            for stop in range(-3, len(listseq) + 3):
                for letter in set(listseq + ['z']):
                    try:
                        expected = listseq.index(letter, start, stop)
                    except ValueError:
                        with self.assertRaises(ValueError):
                            seqseq.index(letter, start, stop)
                    else:
                        actual = seqseq.index(letter, start, stop)
                        self.assertEqual(actual, expected, (letter, start, stop))
    
        # Compare Sequence.index() behavior to str.index() behavior                         
        strseq = 'abracadabra'
        seqseq = SequenceSubclass(strseq)
        for start in range(-3, len(strseq) + 3):
            for stop in range(-3, len(strseq) + 3):
                for letter in set(strseq + 'z'):
                    try:
                        expected = strseq.index(letter, start, stop)
                    except ValueError:
                        with self.assertRaises(ValueError):
                            seqseq.index(letter, start, stop)
                    else:
                        actual = seqseq.index(letter, start, stop)
                        self.assertEqual(actual, expected, (letter, start, stop)

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 7, 2015

    I modified your test case somewhat. Also, your tests uncovered an issue with negative indexes -- oops, hadn't thought of those. Fixed. Let me know what you think.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 7, 2015

    Why is there no "review" link next to my second patch?

    @rhettinger
    Copy link
    Contributor Author

    Try something like this:

            if start < 0:
                start += len(self)
            if stop is None:
                stop = len(self)
            elif stop < 0:
                stop += len(self)
            for i in range(max(start, 0), min(stop, len(self))):
                if self[i] == value:
                    return i
            raise ValueError

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 8, 2015

    Are you sure? I noticed that __iter__ went out of its way to avoid calling len().

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Jan 8, 2015

    I think it avoids len because the length might change during iteration due to side-effects of other code. Since a shrinking sequence would raise an IndexError anyway when you overran the end, it may as well not assume the length is static and just keep indexing forward until it hits an IndexError. It's less of an issue (though not a non-issue) with index, because index actually performs all the indexing without returning to user code; __iter__ pauses to allow user code to execute between each yield, so the odds of a length mutation are much higher.

    You might be able to use len (and just say that if a side-effect of an equality comparison causes the sequence to change length, or another thread messes with it, that's your own fault), but you'd probably want to catch and convert IndexError to ValueError to consistently respond to "we didn't find it" with the same exception.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Jan 8, 2015

    Note: index returns without the caller having a chance to execute code that would change the sequence length directly. But other threads could change it, as could a custom __eq__ on an object stored in the sequence (or a poorly implemented __getitem__ or __len__ on the sequence itself, but that's getting even more pathological). Thread consistency is the code's responsibility though (we just need to make sure we behave the best we can, and hope they use locks correctly), and the odds of equality of __getitem__ altering the sequence are much lower than the odds of someone iterating the sequence and changing it as they go (which is what __iter__'s implementation allows, responding with potentially incomplete results since items might be skipped due to the mutation), but keeping the sequence in a consistent state.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 8, 2015

    I'm going to add a test case that changes the sequence length during .index(), and just do whatever list does in that case.

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 9, 2015

    I take it back, I don't want to copy what the list type does, because it's wrong: http://bugs.python.org/issue23204

    @rhettinger
    Copy link
    Contributor Author

    I'm afraid you're getting lost in details that don't matter. We're trying to make the index() method more useful so that searches and be restarted where they left off.

    @serhiy-storchaka
    Copy link
    Member

    I afraid that the patch can change computational complexity. The iteration usually has linear complexity, but indexing can has non-constant complexity. E.g. for linked list it will cause quadratic complexity of index(). May be we should have special case for start=0 and stop=None. And document this.

    @serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Jan 9, 2015
    @rhettinger
    Copy link
    Contributor Author

    The iteration usually has linear complexity

    The iteration abstract method depends on indexing as well:

        def __iter__(self):
            i = 0
            try:
                while True:
                    v = self[i]
                    yield v
                    i += 1
            except IndexError:
                return

    @ssbr
    Copy link
    Mannequin

    ssbr mannequin commented Jan 10, 2015

    I inferred from Serhiy's comment that if you override __iter__ to be efficient and not use __getitem__, this overridden behavior used to pass on to index(), but wouldn't after this patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 23, 2015

    New changeset cabd7261ae80 by Raymond Hettinger in branch 'default':
    Issue bpo-23086: Add start and stop arguments to the Sequence.index() mixin method.
    https://hg.python.org/cpython/rev/cabd7261ae80

    @rhettinger
    Copy link
    Contributor Author

    Devin, thanks for the patch.

    Serhiy, I added a performance note discussing the computational complexity.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants