Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behavior for deque.insert() when len(d) == maxlen #70382

Closed
rhettinger opened this issue Jan 24, 2016 · 28 comments
Closed

Undefined behavior for deque.insert() when len(d) == maxlen #70382

rhettinger opened this issue Jan 24, 2016 · 28 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@rhettinger
Copy link
Contributor

BPO 26194
Nosy @tim-one, @rhettinger, @mdickinson, @vstinner, @vadmium, @serhiy-storchaka, @MojoVampire
Files
  • full_deque.diff: Option to raise an IndexError
  • full_deque_alt.diff: Plain insert followed by pop from right
  • full_deque2.diff: Index error for insert into a full deque
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2016-02-07.09:06:11.915>
    created_at = <Date 2016-01-24.17:47:04.579>
    labels = ['extension-modules', 'type-bug']
    title = 'Undefined behavior for deque.insert() when len(d) == maxlen'
    updated_at = <Date 2016-02-07.09:06:11.915>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2016-02-07.09:06:11.915>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2016-02-07.09:06:11.915>
    closer = 'rhettinger'
    components = ['Extension Modules']
    creation = <Date 2016-01-24.17:47:04.579>
    creator = 'rhettinger'
    dependencies = []
    files = ['41717', '41718', '41732']
    hgrepos = []
    issue_num = 26194
    keywords = ['patch']
    message_count = 28.0
    messages = ['258893', '258943', '258947', '258948', '258995', '258996', '259004', '259005', '259007', '259010', '259043', '259046', '259050', '259055', '259063', '259072', '259115', '259156', '259216', '259359', '259369', '259372', '259405', '259418', '259501', '259507', '259519', '259527']
    nosy_count = 8.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26194'
    versions = ['Python 3.5', 'Python 3.6']

    @rhettinger
    Copy link
    Contributor Author

    The behavior of deque.insert() in a bounded deque is a bit odd:

    >>> from collections import deque
    >>> d = deque(range(20), maxlen=10)
    >>> d
    deque([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], maxlen=10)
    >>> d.insert(3, 'New')
    >>> d
    deque([19, 10, 11, 'New', 13, 14, 15, 16, 17, 18], maxlen=10)
    #       ^   ^        ^--- The 12 got replaced
    #       |   \--- The elements before the insertion point got rotated
    #       \--- The final element got rotated to first position

    With append/appendleft/extend/extendleft, the intended and useful behavior for maxlen is to pop from the opposite end. But for deque.insert(), the best and most useful behavior invariants are less obvious.

    Ideally, the effect of d.insert(0, item) would be the same as d.appendleft(item), and the effect of d.insert(len(d), item) would be the same as d.append(item).

    Also, d.insert(i, newitem) should have the post-condition:

    assert d[i] == newitem if i < len(d) else d[-1] == newitem

    Having decided where to put the newitem, the question turns to deciding which element should be popped-off to maintain the size invariant.

    I'm thinking that it should always be the rightmost element that gets popped, so that items before the inserted element never get moved. This matches what insert() does for unbounded deques, making it the least surprising choice.

    @rhettinger rhettinger self-assigned this Jan 24, 2016
    @rhettinger rhettinger added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jan 24, 2016
    @rhettinger
    Copy link
    Contributor Author

    Mark, Serhiy and Timmy, do you have any thoughts on what is the right behavior?

    One option is to always pop the rightmost element to make room, but that results in a weird asymmetry between d.insert(len(d), item) and what d.append(item) would do. I can't seem to come with a coherent set of invariants that doesn't have a surprising discontinuity.

    Another option is to "refuse the temptation to guess" at what the user intends for the popped-off element to be and to raise an exception. But that isn't very user friendly either.

    @rhettinger
    Copy link
    Contributor Author

    Another take: Do a regular insertion with no special cases, followed by a pop from the right:

    >>> for i in range(len(d) + 1):
    	s = list(d)
    	s.insert(i, None)      
    	_ = s.pop()
    	print(i, s)
    	
    0 [None, 'a', 'b']
    1 ['a', None, 'b']
    2 ['a', 'b', None]
    3 ['a', 'b', 'c']

    Nice parts:

    • Doesn't change pop direction depending on the inputs
    • Guarantee that entries to the left of i will keep their position.
    • Post condition of d[i]==newelem applies whenever i exists.

    Not nice part:

    • Initially surprising that d.insert(len(d), newelem) does not actually insert newelem.
    • d.insert(len(d), newelem) not the same as d.append(newelem).

    Another alternative is to raise an exception for the one case where index==maxlen, with the theory being that d[index] won't be a valid index so you can't insert anything there.

    @rhettinger
    Copy link
    Contributor Author

    The plain-insert-followed-by-a-pop-on-the-right is giving reasonable results with nice properties:

    d.insert(0, i)

    0 --> deque([0], maxlen=4)
    1 --> deque([1, 0], maxlen=4)
    2 --> deque([2, 1, 0], maxlen=4)
    3 --> deque([3, 2, 1, 0], maxlen=4)
    4 --> deque([4, 3, 2, 1], maxlen=4)
    5 --> deque([5, 4, 3, 2], maxlen=4)
    6 --> deque([6, 5, 4, 3], maxlen=4)
    7 --> deque([7, 6, 5, 4], maxlen=4)
    8 --> deque([8, 7, 6, 5], maxlen=4)
    9 --> deque([9, 8, 7, 6], maxlen=4)

    d.insert(len(d), i)
    -----------------------------------
    0 --> deque([0], maxlen=4)
    1 --> deque([0, 1], maxlen=4)
    2 --> deque([0, 1, 2], maxlen=4)
    3 --> deque([0, 1, 2, 3], maxlen=4)
    4 --> deque([0, 1, 2, 3], maxlen=4)
    5 --> deque([0, 1, 2, 3], maxlen=4)
    6 --> deque([0, 1, 2, 3], maxlen=4)
    7 --> deque([0, 1, 2, 3], maxlen=4)
    8 --> deque([0, 1, 2, 3], maxlen=4)
    9 --> deque([0, 1, 2, 3], maxlen=4)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 27, 2016

    New changeset ce3d47eaeb21 by Raymond Hettinger in branch '3.5':
    Issue bpo-26194: Fix undefined behavior for deque.insert() when len(d) == maxlen
    https://hg.python.org/cpython/rev/ce3d47eaeb21

    @serhiy-storchaka
    Copy link
    Member

    New behavior looks less surprising to me.

    Old behavior is even more weird for negative indices:

    >>> from collections import deque
    >>> d = deque(range(20), maxlen=10)
    >>> d
    deque([10, 11, 12, 13, 14, 15, 16, 17, 18, 19], maxlen=10)
    >>> d.insert(-3, 'New')
    >>> d
    deque([11, 12, 13, 14, 15, 16, 'New', 18, 19, 10], maxlen=10)

    Note that new element not just replaced the old one in the middle of the deque, but all context was rotated one position left.

    Patched code behave less surprising.

    >>> d.insert(-3, 'New')
    >>> d                                                                   
    deque([10, 11, 12, 13, 14, 15, 'New', 16, 17, 18], maxlen=10)

    @serhiy-storchaka
    Copy link
    Member

    But the postcondition d[i] == newitem is broken if i is negative.

    >>> d = deque('ABC', maxlen=3)
    >>> d.insert(-1, None)
    >>> d
    deque(['A', None, 'B'], maxlen=3)

    I would expected ['A', 'B', None].

    @vstinner
    Copy link
    Member

    I would expected ['A', 'B', None].

    Hum, it seems consistent with list behaviour, no?

    >>> x=list("abc")
    >>> x.insert(-1, "X")
    >>> x
    ['a', 'b', 'X', 'c']

    -1 is the same than len(x)-1 (2 in this example).

    deque(['A', None, 'B'], maxlen=3) seems correct to me.

    @serhiy-storchaka
    Copy link
    Member

    -1 is the position before the last element ('C'). After popping-off extra element the result can be ['A', 'B', None] or ['B', None, 'C'].

    @vstinner
    Copy link
    Member

    -1 is the position before the last element ('C'). After popping-off extra element the result can be ['A', 'B', None] or ['B', None, 'C'].

    Oh ok :-) Now I'm confused, I don't know what is the expected behaviour :-)

    @rhettinger
    Copy link
    Contributor Author

    The expected behavior should be "insert normally as if the deque were unbounded and then pop-off the rightmost element to restore the maxlen invariant".

    The applied fix does that for non-negative indices but gives the wrong result for negative indicies:

      >>> from collections import deque
      >>> d = deque('abcde', maxlen=5)
      >>> d.insert(-1, 'f')
      >>> list(d)
      ['a', 'b', 'c', 'f', 'd']
      >>> s = list('abcde')
      >>> s.insert(-1, 'f'); del s[-1]
      >>> s
      ['a', 'b', 'c', 'd', 'f']

    I think the behavior can be made explainable and also be useful for common cases, but there is likely no getting around odd looking results with negative index insertions into bounded deques already at their maximum size.

    @tim-one
    Copy link
    Member

    tim-one commented Jan 27, 2016

    I'd raise an exception when trying to insert into a bounded deque that's already full. There's simply no way to guess what was _intended_; it's dead easy for the user to implement what they _do_ intend (first make room by deleting the specific item they no longer want); and I can't think of a use case compelling enough to justify whatever arbitrary non-exceptional behavior may be implemented instead.

    WRT the behavior you settled on, sure, it's explainable. That doesn't imply it's useful, though ;-) I'd rather have an exception. It's plain bizarre that after

        d.insert(i, x)

    one can't even rely on

    assert any(y is x for y in d)
    

    succeeding. Implementing behavior that allows that invariant to fail is _really_ user-unfriendly ;-)

    In contrast, what .append() and .appendleft() do for a full bounded deque are compelling (and don't violate the weak invariant above).

    @serhiy-storchaka
    Copy link
    Member

    What if implement the bahavior described by Raymond for index -len(d) <= i < len(d), but raise an exception if the index is out of this range?

    The result of deque('abc', maxlen=3).insert(i, 'X') would be:

    -4: error
    -3: ['X', 'a', 'b']
    -2: ['a', 'X', 'b']
    -1: ['a', 'b', 'X']
    0: ['X', 'a', 'b']
    1: ['a', 'X', 'b']
    2: ['a', 'b', 'X']
    3: error

    @tim-one
    Copy link
    Member

    tim-one commented Jan 27, 2016

    My opinion doesn't change: I'd rather see an exception. I see no use case for inserting "into the middle" of a full bounded queue. If I had one, it would remain trivial to force the specific behavior I intended.

    @rhettinger
    Copy link
    Contributor Author

    My only aversion to raising an exception is that it goes against the original intent of maxlen as giving an automatic-pop-to-make-room behavior.

    Rather that introduce a discontinuity, I like the "smoothness" of letting d.insert(len(d), obj) follow the insert-normally-then-pop-from-the-right rule as opposed to having a special case.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Jan 27, 2016

    I agree with Tim that arbitrarily deciding that insert should behave more like appendleft is surprising. This really feels like guessing at what the user wants, silently doing something that really isn't predictable.

    Any of the following seem nicer (not exactly arguing for any but #1):

    1. Raising an exception for full deque
    2. Making insert pop left when full (and possibly make insertleft that will pop right; the name isn't perfect, but it would make the symmetry between "no suffix pops left and 'left' suffix pops right" line up with other deque methods)
    3. Adding an optional argument to say which end should be popped (defaulting to "raise an exception", probably no good though, since it would break Sequence rules.

    @serhiy-storchaka
    Copy link
    Member

    full_deque2.diff LGTM. But I have added minor suggestions on Rietveld.

    @rhettinger
    Copy link
    Contributor Author

    Since slicing is going to be added to deques, it is worth thinking about what the behavior should be there as well:

       d = deque('abcde', maxlen=7)
       d[3:4] = 'efghijklm'

    Side note: repetition behaves like extend() using maxlen to decide how much to truncate from the left:

        >>> from collections import deque
        >>> d = deque('abcde', maxlen=8)
        >>> d *= 2
        >>> d
        deque(['c', 'd', 'e', 'a', 'b', 'c', 'd', 'e'], maxlen=8)

    Ideally, the deque API should be coherent and simple taken as a whole so that the behaviors are consistent and predictable as possible even if the general rules lead to some oddities in some uncommon cases.

    One such general rule could be: "When maxlen is defined, operations proceed as if the deque were unbounded and then truncation is applied to the left" (this is like the rule in decimal where inputs to operations are treated as precise and context rounding is applied after the operation). The general rule would fit a mental models of "inserted new data is prioritized over old data", that "maxlen is all about automatically evicting data to make room for the newer data", and that "methods without 'left' in their name correspond to newest data on the right and next to be evicted on the left".

    @serhiy-storchaka
    Copy link
    Member

    *All* reasons look reasonable. May be discuss this on Python-Dev or ask BDFL?

    From my point it looks that correct implementation of the insertion is too complex and this feature should go only in development release (if any).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 2, 2016

    New changeset 7dabb94bdf63 by Raymond Hettinger in branch '3.5':
    Issue bpo-26194: Inserting into a full deque to raise an IndexError
    https://hg.python.org/cpython/rev/7dabb94bdf63

    @vadmium
    Copy link
    Member

    vadmium commented Feb 2, 2016

    FYI Raymond, your revision 0731f097157b didn’t actually merge the 3.5 branch (It only had one parent). Also, the default branch never got any NEWS entry in 58266f5101cc, so maybe it needs to be added?

    @vstinner
    Copy link
    Member

    vstinner commented Feb 2, 2016

    FYI Raymond, your revision 0731f097157b didn’t actually merge the 3.5 branch (It only had one parent). Also, the default branch never got any NEWS entry in 58266f5101cc, so maybe it needs to be added?

    Maybe it's also worth to document the change in Python 3.5 and 3.6 documentation with a ".. versionchanged::"?
    https://docs.python.org/dev/library/collections.html#deque-objects

    It can be suprising to get a different behaviour between two bugfix versions (Python 3.5.1 and 3.5.2). I'm not saying that we must not fix bugs, just that it helps users to document them :-)

    It will also avoid bug reports about this behaviour change. Example of user report: issue bpo-26260, "it works on Python 2 but no more on Python 3" ;-)

    @serhiy-storchaka
    Copy link
    Member

    Reopened. Otherwise Martin's comments can be lost in spam.

    Maybe it's also worth to document the change in Python 3.5 and 3.6 documentation with a ".. versionchanged::"?

    I don't think this is needed. This isn't new feature, this is just a bugfix, old behavior obviously was incorrect. We don't add the versionchanged directive for every bugfix.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 2, 2016

    Ok no problem, it was just a suggestion.

    In asyncio, we try to document behaviour change but asyncio is different.
    Major functions are still modified, asyncio API is still provisional, but
    people rely on tehe exact API.

    @rhettinger
    Copy link
    Contributor Author

    [Martin]

    the default branch never got any NEWS entry in
    58266f5101cc, so maybe it needs to be added?

    No need for a news entry in default. We haven't released 3.6 yet, so the bugfix from 3.5 is just a carry forward of the corrected behavior. That isn't news ;-)

    [Martin]

    revision 0731f097157b didn’t actually merge the 3.5
    branch (It only had one parent)

    I'm not sure I see what was missed here (I am not a mercurial jedi). My 3.5 checkout shows the change and so does the 3.6. What needs to be done at this point ?

    [Serhiy]

    This isn't new feature, this is just a bugfix, old behavior
    obviously was incorrect. We don't add the versionchanged
    directive for every bugfix.

    I concur with Serhiy.

    @serhiy-storchaka
    Copy link
    Member

    No need for a news entry in default. We haven't released 3.6 yet, so the bugfix from 3.5 is just a carry forward of the corrected behavior. That isn't news ;-)

    NEWS entries for 3.5.2 are not included in the NEWS file for 3.6. Usually we add bugfix NEWS entries in all branches that take the fix.

    What needs to be done at this point ?

    It is too late to do anything except add a NEWS entry. Branches were merged by Martin when his applied next patch.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 3, 2016

    I thought the 3.6.0 NEWS file generally listed bugs fixed since (at least) the 3.5.0 release. But the NEWS in default has no mention of bpo-26194.

    Serhiy is right that there is nothing more to do for the merging problem. I was just pointing out that something in your “merge” commit must have not worked properly. Normally a merge looks like revision 58266f5101cc (there are two parents listed). But in this case I pulled from default expecting to get all the 3.5 changes as well, but never got your latest 3.5 change until I pulled it explicitly.

    @rhettinger
    Copy link
    Contributor Author

    I can't say that I agree with putting an additional entry the 3.6 NEWS. In the past, I've not made news entries in unreleased versions regarding bug fixes in prior versions. That seems pointless to me and it would tend to clutter a document whose size already gets in the way of its usability.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants