classification
Title: use both "for in" and "ElementTree.remove" has a index bug
Type: behavior Stage: resolved
Components: Documentation, XML Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: WoodyWoo, docs@python, eli.bendersky, eric.smith, miss-islington, ned.deily, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-09-30 14:20 by WoodyWoo, last changed 2020-10-05 04:22 by scoder. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22464 merged scoder, 2020-09-30 15:27
PR 22546 closed scoder, 2020-10-04 17:18
PR 22553 merged miss-islington, 2020-10-04 23:14
PR 22554 merged miss-islington, 2020-10-04 23:22
Messages (18)
msg377698 - (view) Author: (WoodyWoo) Date: 2020-09-30 14:20
#Just run it in Python v3.8.6 of win7 32bit 
import xml.etree.ElementTree as ET 
xmlstr='''<root>
<a no="1"/>
<b/>
<c/>
</root>'''
etroot = ET.fromstring(xmlstr)
for ELEMchild in etroot:
    if ELEMchild.get("no") == "1" :
        etroot.remove(ELEMchild)       #so far so good
    print (ELEMchild.tag)
    #It should be :  "b /n c" or "a /n b /n c",I can live with it both.
    #But it is :  "a /n c".
    #The index of ELEMchild should not +1 when there was a remove method worked on one of the before ELEMs.
    #I was forced to use while and if to control the index +1/+0.
    #BTW,ELEM has no method returning index in siblings, which is buging me too.
msg377699 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-30 14:31
I assume that ElementTree doesn't support mutation while iterating.

However, the docs at https://docs.python.org/3/library/xml.etree.elementtree.html#modifying-an-xml-file show removing an item while iterating. It probably only works because the child being removed is the last one.
msg377700 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-09-30 14:43
You will get the same behavior for lists:

>>> a = [1, 2, 3]
>>> for x in a:
...     if x == 1:
...         a.remove(x)
...     print(x)
... 
1
3

Lists are iterated by index. First you get an item at index 0, then at index 1, etc, to the end of the list. Initially the list is [1, 2, 3]. After removing 1 at the first iteration it becomes [2, 3], and at the next iteration you get an item at index 1 which is 3.
msg377701 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-30 14:50
I think the only action here is to improve the documentation. That example is especially problematic.
msg377702 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-09-30 15:09
> That example is especially problematic.

No, it's not. It uses .findall(), which returns a list. It's like when you make a copy of a list to iterate over, when you want to modify the original list in the loop.

That could be made explicit in the text that introduces the example, but I think it's a very good example.
msg377703 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-30 15:13
Ah, good point. I agree the example should make that clear. And I think a note in .remove() about using it while iterating would be a good idea, too.
msg377704 - (view) Author: (WoodyWoo) Date: 2020-09-30 15:19
I'm green hand in Coding.
But I think the problem caused by "for each in list".
Anything changed in "list" should not act at once,but should act when "for end" or break.
msg377705 - (view) Author: (WoodyWoo) Date: 2020-09-30 15:22
Only that makes "for each in list" literally.
msg377706 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-09-30 15:24
The example is iterating over the list returned by root.findall(), but removing from a different data structure in root, so it won't have a problem.
msg377707 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-09-30 15:31
@WoodyWoo, you can (and should) do something like this:

    for ELEMchild in list(etroot):

Modifying a container while iterating over it is usually not a good idea. The same applies to Python lists and dicts, for example.
msg377724 - (view) Author: (WoodyWoo) Date: 2020-09-30 23:27
I think "for each in list" should not work by index,but should work by pointer like thing,could be a list of pointers.
When the "for each in list" was acting,the pointers list would be fixed,and you need not to worry about the "list" changing.
msg377732 - (view) Author: (WoodyWoo) Date: 2020-10-01 01:59
@eric.smith @scoder @serhiy.storchaka Thank U all.
I get what to do,and still think the "for in" structure should rebuilding.
All three methods:

import xml.etree.ElementTree as ET 
xmlstr=\
r'''<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama1">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
    <country name="Panama2">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>'''
print(xmlstr)

#orginal code
root = ET.fromstring(xmlstr)
for country in root.findall('country'):
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)
print("___orginal___")
for country in root.findall('country'):
        print (country.get("name"))
print("^^^orginal^^^^\n")

#wrong code in my mind
root = ET.fromstring(xmlstr)
for country in root:
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)
print("___bad___")
for country in root.findall('country'):
        print (country.get("name"))
print("^^^bad^^^^\n")

#my code
root = ET.fromstring(xmlstr)
index=0
count=len(root.findall("./*"))
while index <count:
    rank = int (root[index].find("rank").text)
    if rank>50:
        root.remove(root[index])
        index=index+0
        count=count-1  # avoid index err
        continue
    index=index+1
print("___new___")
for country in root.findall('country'):
        print (country.get("name"))
print("^^^new^^^^\n")
msg377756 - (view) Author: (WoodyWoo) Date: 2020-10-01 14:56
Could I say the mutable sequence containing not the object but the pointer like C++.
So they can changed in def functions.
msg377764 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-10-01 15:34
I can't say how ElementTree works without more checking, but this solution cannot work in general. Given a pointer to an object that's in a list, how would you get to the next item? Say the parent list-like object has a C array of pointers to the objects it contains, and removing one of the objects re-shuffles that list. How would keeping a pointer to the current object help you?

In any event, I don't think this behavior is going to change.
msg377982 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-10-04 23:13
New changeset 40db798692ca783fc2163656f196ac77e8b9e792 by scoder in branch 'master':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/40db798692ca783fc2163656f196ac77e8b9e792
msg377984 - (view) Author: miss-islington (miss-islington) Date: 2020-10-04 23:23
New changeset 6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58 by Miss Skeleton (bot) in branch '3.8':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58
msg377989 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-10-04 23:57
New changeset d5719247ba32b3ac700454bad9a9e2cc7edeac6a by Miss Skeleton (bot) in branch '3.9':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464) (GH-22554)
https://github.com/python/cpython/commit/d5719247ba32b3ac700454bad9a9e2cc7edeac6a
msg377992 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-10-05 04:22
Closing since this works as designed.

Users are responsible for avoiding concurrent tree modifications during iteration.
History
Date User Action Args
2020-10-05 04:22:13scodersetstatus: open -> closed
resolution: not a bug
messages: + msg377992

stage: patch review -> resolved
2020-10-04 23:57:05ned.deilysetmessages: + msg377989
2020-10-04 23:23:51miss-islingtonsetmessages: + msg377984
2020-10-04 23:22:22miss-islingtonsetpull_requests: + pull_request21552
2020-10-04 23:14:03miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request21551
2020-10-04 23:13:53ned.deilysetnosy: + ned.deily
messages: + msg377982
2020-10-04 17:18:40scodersetpull_requests: + pull_request21547
2020-10-03 05:59:01scodersetversions: + Python 3.9, Python 3.10
2020-10-01 15:34:14eric.smithsetmessages: + msg377764
2020-10-01 14:56:12WoodyWoosetmessages: + msg377756
2020-10-01 01:59:03WoodyWoosetmessages: + msg377732
2020-09-30 23:27:22WoodyWoosetmessages: + msg377724
2020-09-30 15:31:35scodersetstage: patch review
2020-09-30 15:31:18scodersetmessages: + msg377707
stage: patch review -> (no value)
2020-09-30 15:27:46scodersetkeywords: + patch
stage: patch review
pull_requests: + pull_request21489
2020-09-30 15:24:21eric.smithsetmessages: + msg377706
2020-09-30 15:22:21WoodyWoosetmessages: + msg377705
2020-09-30 15:19:15WoodyWoosetmessages: + msg377704
2020-09-30 15:13:08eric.smithsetmessages: + msg377703
2020-09-30 15:09:42scodersetmessages: + msg377702
2020-09-30 15:01:05serhiy.storchakasetnosy: + eli.bendersky, scoder, docs@python

components: + Documentation
assignee: docs@python
2020-09-30 14:50:06eric.smithsetmessages: + msg377701
2020-09-30 14:43:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg377700
2020-09-30 14:31:44eric.smithsetnosy: + eric.smith
messages: + msg377699
2020-09-30 14:23:14WoodyWoosettitle: use both "for" and "ElementTree.remove" has a index bug -> use both "for in" and "ElementTree.remove" has a index bug
2020-09-30 14:20:25WoodyWoocreate