classification
Title: Add programming FAQ entry: remove multiple entries from list
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: eric.smith, miss-islington, paul.moore, rhettinger, sreedevi.ha, steven.daprano, terry.reedy
Priority: normal Keywords: patch

Created on 2020-09-12 16:20 by sreedevi.ha, last changed 2020-10-05 15:16 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
listrepxxx.py sreedevi.ha, 2020-09-12 16:20 Program to remove 20 from list
Pull Requests
URL Status Linked Edit
PR 22402 merged terry.reedy, 2020-09-24 20:11
PR 22447 merged miss-islington, 2020-09-29 05:03
PR 22448 merged miss-islington, 2020-09-29 05:03
PR 22562 merged terry.reedy, 2020-10-05 14:11
PR 22563 merged miss-islington, 2020-10-05 14:32
PR 22564 merged miss-islington, 2020-10-05 14:32
Messages (23)
msg376803 - (view) Author: Sreedevi (sreedevi.ha) Date: 2020-09-12 16:20
Define a list which has all same elements.
While removing the same element from list using for loop and by using remove() method, output should be empty list, but the output we get is one element is unremoved
msg376804 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-12 16:27
You should not mutate a list while iterating over it.

See, for example https://stackoverflow.com/questions/6260089/strange-result-when-removing-item-from-a-list

I couldn't find a place where this is mentioned in the python list docs. If it's not there, we should add something to the programming FAQ.
msg376805 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-12 16:28
Removing Windows and IDLE devs from nosy list, since this isn't related to either of those areas.
msg376806 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-09-12 16:53
You say:

"output should be empty list"

but that's not actually correct. You intend the output to be the empty list, but if you think carefully about what happens during iteration, you should see that the behaviour is correct.

To make it easier to see what is happening, let's use different values:

L = [20, 21, 22]


On the first iteration, the interpreter looks at position 0, which is 20. You remove 20 from the list, which shrinks the list down to:

L = [21, 22]

Now it is 21 in position 0. But that iteration of the loop is finished, so the interpreter looks at position 1, which is 22. You remove 22 from the list, giving:

L = [21]

and the interpreter looks at position 2, which does not exist, so the loop finishes.


>>> L = [20, 21, 22]
>>> for i in L:
...     L.remove(i)
...     print('i', i, 'L is now:', L)
... 
i 20 L is now: [21, 22]
i 22 L is now: [21]
>>> L
[21]


So the interpreter did exactly what you told it to do. The problem is that what you told it to do is not what you wanted it to do.
msg376807 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2020-09-12 17:17
There is a bug in your program. You should not mutate the list while looping over it.
msg376808 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-09-12 17:34
I think Eric left the issue open because he was hoping to update the FAQs and/or docs with information about this issue.

I will leave it to someone else to decide whether or not to reopen it.
msg376810 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2020-09-12 18:42
Yeah, apologies - I saw the email notification, but Eric removed me from the nosy list (quite reasonably) and I didn't notice there had been extra discussion :-(

If someone wants to re-open the issue, feel free.
msg376822 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-13 05:00
Sreedevi: the Windows tag is meant for issues that involve Windows behavior that is different from other OSes; the IDLE tag is meant for issues that involve the behavior of IDLE, as distinct from other ways of running Python code.  They should not be used just because you ran code from IDLE on Windows.

There are two aspects to mutating while iteration: correctness and speed.  For beginners, the general advice "Don't do it" is not bad.  However:
  
Removing items from a list *works* when iterating in reverse if removal is done by index (del mylist[i] rather than value (mylist.remove(value)).  But it turns an inherently O(n:=len(list)) operation into a slow O(n*n) operation.  

The usually recommended alternative is to make a new list of things not deleted.  But one can do this in-place by writing the new list on top of the old by using a explicit second index to move items just once.

I reopened to add to the programming FAQ Sequences (Tuples/Lists) section
How do you remove multiple items from a list?
after the current
How do you remove duplicates from a list?
msg377456 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-24 20:02
I modeled the new entry on the previous one.  The code and its test.

def fr(n, remove):
    mylist = list(range(n))
    for i in range(len(mylist)-1, -1, -1):
        if remove(mylist[i]):
            del mylist[i]
    return mylist

def ff(n, keep):
   mylist = list(range(n))
   j = 0
   for i, item in enumerate(mylist): 
       if keep(item):
           mylist[j] = item
           j += 1
   del mylist[j:]
   return mylist

for i in range(9):

    expect = list(range(0, i, 2))
    def remove(n): return n % 2
    def keep(n): return n % 2 == 0
    print(fr(i, remove) == ff(i, keep) == expect)

    expect = list(range(i//2))
    def remove(n): return n >= i//2
    def keep(n): return n < i//2
    print(fr(i, remove) == ff(i, keep) == expect)

    expect = list(range(i//2, i))
    def remove(n): return n < i//2
    def keep(n): return n >= i//2
    print(fr(i, remove) == ff(i, keep) == expect)

# all True
msg377464 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-09-24 22:33
> The usually recommended alternative is to make a new list of 
> things not deleted.  But one can do this in-place by writing
> the new list on top of the old by using a explicit second 
> index to move items just once.

I don't think we should send users down this path.  Better to stick with high level, easy-to-implement and performant suggestions:

"""
To edit a list in-place it is often simplest and fastest to replace the entire list:

   colors[:] = [color for color in colors if websafe(color)]

This makes a single pass through the list and efficiently builds a new list.  The colors[:] then replaces the entire contents of the original list with the new list
"""

Besides being easy to get right, this is likely to be *much* faster than tracking two indicies to manipulate the array in-place.  Slice operations run at the speed of a C memcpy (plus the ref count changes) and don't involve creating and destroying integer objects for indexing.
msg377471 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-25 01:20
You are right; the replacement index I called 'j' is better buried as a C index or pointer within a slice replacement. In fact, a generator expression, if one has a keep expression, or a filter call, if one has filter function, work, without the intermediate list.   Both also incorporate the keep scan index/pointer in C.  I verified that this works by defining 3 functions.

def fc(n, keep):
    mylist = list(range(n))
    mylist[:] = [x for x in mylist if keep(x)]
    return mylist

def fg(n, keep):
    mylist = list(range(n))
    mylist[:] = (x for x in mylist if keep(x))
    return mylist

def fl(n, keep):
    mylist = list(range(n))
    mylist[:] = filter(keep, mylist)
    return mylist

I added a second test expression.

    print(fc(i, keep) == fg(i, keep) == fl(i, keep) == expect)

at the 3 obvious places in the test loop above.
---

In the existing question about removing duplicates, the existing all-hashable answer
   mylist = list(set(mylist))
could be replaced by
   mylist[:] = set(mylist)
msg377477 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-09-25 03:48
Try running some timings.  I suspect the genexp/iterator versions run more slowly.  The speed of the slice replacement relies on knowing the size of the input.
msg377478 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-25 04:05
Timings depend on multiple factors, including implementation (cpython versus pypy versus cython, etc) and complexity of the keep/discard decision. I said in the proposed text that a listcomp may be faster.  I think that sufficient; anyone who cares can test their specific case.

I like the simple ad easy 'slice replacement = iterator' form because it illustrates to me that we have done something right with Python's design.
msg377484 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-09-25 08:31
> I like the simple ad easy 'slice replacement = iterator' form
>  because it illustrates to me that we have done something 
> right with Python's design.

I understand that you like it and that it reflects the way you think the world should work, , but that doesn't warrant putting it in the FAQ.  We should steer users down a path of feeding unsizable inputs into tooling that needs a size to work well (the receiving code either has to implicitly build a list first before it can start or it will have to have periodic resizes). A straight list comprehension will suffice to answer the question cleanly.

FWIW, the same issue occurs with str.join().  It works better with a list comprehension than an iterator.  Given an iterator, it has to build an internal list first before it can start.  That is slower than starting with a list in the first place and makes the memory consumption implicit when it should be explicit (a generator would create the illusion that a list isn't being formed which is misleading).
msg377486 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-25 11:08
Raymond, please take what I have written and rewrite it to your satisfaction.  I have lots else to do, including investigating the IDLE bug you just reported.
msg377497 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-09-25 18:37
Sorry for the wording of the last message.  Go ahead with whatever you would like do :-) 

Was only trying to point-out that the generator semantics don't interact nicely with slice assignments:

    $ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=(x for x in a)'
    5000 loops, best of 5: 46.6 usec per loop

    $ python3.9 -m timeit -s 'a=list(range(1000))' 'b=a[:]' 'b[:]=[x for x in a]'
    10000 loops, best of 5: 26.3 usec per loop
msg377654 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-09-29 05:02
New changeset 5b0181d1f6474c2cb9b80bdaf3bc56a78bf5fbe7 by Terry Jan Reedy in branch 'master':
bpo-41774: Add programming FAQ entry (GH-22402)
https://github.com/python/cpython/commit/5b0181d1f6474c2cb9b80bdaf3bc56a78bf5fbe7
msg377655 - (view) Author: miss-islington (miss-islington) Date: 2020-09-29 05:11
New changeset d50a0700265536a20bcce3fb108c954746d97625 by Miss Islington (bot) in branch '3.8':
bpo-41774: Add programming FAQ entry (GH-22402)
https://github.com/python/cpython/commit/d50a0700265536a20bcce3fb108c954746d97625
msg377656 - (view) Author: miss-islington (miss-islington) Date: 2020-09-29 05:27
New changeset 868c8e41eb1d7dc032679ae06e62c0d60edd7725 by Miss Islington (bot) in branch '3.9':
bpo-41774: Add programming FAQ entry (GH-22402)
https://github.com/python/cpython/commit/868c8e41eb1d7dc032679ae06e62c0d60edd7725
msg377673 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-09-29 16:18
> If space is not an issue, the list comprehension may be fastest.

I think there is still a misunderstanding here.  The generator variant does not save space.  It uses slightly *more* space than the list variant.

The list_ass_slice()¹ function runs PySequence_Fast² on its input.  If the input is already a list or tuple, it is returned immediately.  If not, the iterator is run to exhaustion and a new list is built.  Either way, when the actual update occurs, the source will be a list or tuple.

¹ https://github.com/rhettinger/cpython/blob/master/Objects/listobject.c#L597
² https://github.com/rhettinger/cpython/blob/master/Objects/abstract.c#L2014
msg378034 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-10-05 14:31
New changeset 060937da988347a887a5f165b023d972fcb97802 by Terry Jan Reedy in branch 'master':
bpo-41774: Tweak new programming FAQ entry (GH-22562)
https://github.com/python/cpython/commit/060937da988347a887a5f165b023d972fcb97802
msg378037 - (view) Author: miss-islington (miss-islington) Date: 2020-10-05 14:42
New changeset 7e941fa8e0454c7814ce3ec646136758c0db5a25 by Miss Skeleton (bot) in branch '3.8':
bpo-41774: Tweak new programming FAQ entry (GH-22562)
https://github.com/python/cpython/commit/7e941fa8e0454c7814ce3ec646136758c0db5a25
msg378038 - (view) Author: miss-islington (miss-islington) Date: 2020-10-05 14:53
New changeset 75dd70e1ce0b5ce50c572802c17b7fa427d9ce23 by Miss Skeleton (bot) in branch '3.9':
bpo-41774: Tweak new programming FAQ entry (GH-22562)
https://github.com/python/cpython/commit/75dd70e1ce0b5ce50c572802c17b7fa427d9ce23
History
Date User Action Args
2020-10-05 15:16:04terry.reedysetstatus: open -> closed
versions: + Python 3.8, Python 3.9
type: behavior -> enhancement
title: While removing element from list using for and remove(), which has same items output is not right -> Add programming FAQ entry: remove multiple entries from list
resolution: fixed
stage: patch review -> resolved
2020-10-05 14:53:34miss-islingtonsetmessages: + msg378038
2020-10-05 14:42:17miss-islingtonsetmessages: + msg378037
2020-10-05 14:32:09miss-islingtonsetpull_requests: + pull_request21559
2020-10-05 14:32:00miss-islingtonsetpull_requests: + pull_request21558
2020-10-05 14:31:53terry.reedysetmessages: + msg378034
2020-10-05 14:11:20terry.reedysetpull_requests: + pull_request21557
2020-10-05 05:34:21rhettingersetassignee: terry.reedy
2020-09-29 16:18:21rhettingersetmessages: + msg377673
2020-09-29 05:27:13miss-islingtonsetmessages: + msg377656
2020-09-29 05:11:14miss-islingtonsetmessages: + msg377655
2020-09-29 05:03:07miss-islingtonsetpull_requests: + pull_request21478
2020-09-29 05:03:00miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request21477
stage: needs patch -> patch review
2020-09-29 05:02:54terry.reedysetmessages: + msg377654
2020-09-25 18:37:27rhettingersetmessages: + msg377497
2020-09-25 11:08:19terry.reedysetassignee: terry.reedy -> (no value)
messages: + msg377486
2020-09-25 08:31:08rhettingersetmessages: + msg377484
2020-09-25 04:05:35terry.reedysetmessages: + msg377478
2020-09-25 03:48:46rhettingersetmessages: + msg377477
2020-09-25 01:20:19terry.reedysetmessages: + msg377471
stage: patch review -> needs patch
2020-09-24 22:33:31rhettingersetnosy: + rhettinger
messages: + msg377464
2020-09-24 20:11:39terry.reedysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request21442
2020-09-24 20:02:12terry.reedysetmessages: + msg377456
2020-09-13 05:00:32terry.reedysetstatus: closed -> open

versions: + Python 3.10, - Python 3.8
nosy: + terry.reedy

messages: + msg376822
resolution: not a bug -> (no value)
stage: resolved -> needs patch
2020-09-12 18:42:25paul.mooresetmessages: + msg376810
2020-09-12 17:34:13steven.dapranosetmessages: + msg376808
2020-09-12 17:18:12paul.mooresetstatus: open -> closed
resolution: not a bug
stage: resolved
2020-09-12 17:17:55paul.mooresetnosy: + paul.moore
messages: + msg376807
2020-09-12 16:53:36steven.dapranosetnosy: + steven.daprano
messages: + msg376806
2020-09-12 16:28:51eric.smithsetnosy: - terry.reedy, paul.moore, tim.golden, zach.ware, steve.dower
messages: + msg376805
2020-09-12 16:27:42eric.smithsetcomponents: + Documentation, - IDLE, Windows
2020-09-12 16:27:23eric.smithsetnosy: + eric.smith
messages: + msg376804
2020-09-12 16:20:35sreedevi.hacreate