msg377698 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 14:20 |
#Just run it in Python v3.8.6 of win7 32bit
import xml.etree.ElementTree as ET
xmlstr='''<root>
<a no="1"/>
<b/>
<c/>
</root>'''
etroot = ET.fromstring(xmlstr)
for ELEMchild in etroot:
if ELEMchild.get("no") == "1" :
etroot.remove(ELEMchild) #so far so good
print (ELEMchild.tag)
#It should be : "b /n c" or "a /n b /n c",I can live with it both.
#But it is : "a /n c".
#The index of ELEMchild should not +1 when there was a remove method worked on one of the before ELEMs.
#I was forced to use while and if to control the index +1/+0.
#BTW,ELEM has no method returning index in siblings, which is buging me too.
|
msg377699 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 14:31 |
I assume that ElementTree doesn't support mutation while iterating.
However, the docs at https://docs.python.org/3/library/xml.etree.elementtree.html#modifying-an-xml-file show removing an item while iterating. It probably only works because the child being removed is the last one.
|
msg377700 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2020-09-30 14:43 |
You will get the same behavior for lists:
>>> a = [1, 2, 3]
>>> for x in a:
... if x == 1:
... a.remove(x)
... print(x)
...
1
3
Lists are iterated by index. First you get an item at index 0, then at index 1, etc, to the end of the list. Initially the list is [1, 2, 3]. After removing 1 at the first iteration it becomes [2, 3], and at the next iteration you get an item at index 1 which is 3.
|
msg377701 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 14:50 |
I think the only action here is to improve the documentation. That example is especially problematic.
|
msg377702 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-09-30 15:09 |
> That example is especially problematic.
No, it's not. It uses .findall(), which returns a list. It's like when you make a copy of a list to iterate over, when you want to modify the original list in the loop.
That could be made explicit in the text that introduces the example, but I think it's a very good example.
|
msg377703 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 15:13 |
Ah, good point. I agree the example should make that clear. And I think a note in .remove() about using it while iterating would be a good idea, too.
|
msg377704 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 15:19 |
I'm green hand in Coding.
But I think the problem caused by "for each in list".
Anything changed in "list" should not act at once,but should act when "for end" or break.
|
msg377705 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 15:22 |
Only that makes "for each in list" literally.
|
msg377706 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 15:24 |
The example is iterating over the list returned by root.findall(), but removing from a different data structure in root, so it won't have a problem.
|
msg377707 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-09-30 15:31 |
@WoodyWoo, you can (and should) do something like this:
for ELEMchild in list(etroot):
Modifying a container while iterating over it is usually not a good idea. The same applies to Python lists and dicts, for example.
|
msg377724 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 23:27 |
I think "for each in list" should not work by index,but should work by pointer like thing,could be a list of pointers.
When the "for each in list" was acting,the pointers list would be fixed,and you need not to worry about the "list" changing.
|
msg377732 - (view) |
Author: (WoodyWoo) |
Date: 2020-10-01 01:59 |
@eric.smith @scoder @serhiy.storchaka Thank U all.
I get what to do,and still think the "for in" structure should rebuilding.
All three methods:
import xml.etree.ElementTree as ET
xmlstr=\
r'''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama1">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
<country name="Panama2">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>'''
print(xmlstr)
#orginal code
root = ET.fromstring(xmlstr)
for country in root.findall('country'):
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
print("___orginal___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^orginal^^^^\n")
#wrong code in my mind
root = ET.fromstring(xmlstr)
for country in root:
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
print("___bad___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^bad^^^^\n")
#my code
root = ET.fromstring(xmlstr)
index=0
count=len(root.findall("./*"))
while index <count:
rank = int (root[index].find("rank").text)
if rank>50:
root.remove(root[index])
index=index+0
count=count-1 # avoid index err
continue
index=index+1
print("___new___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^new^^^^\n")
|
msg377756 - (view) |
Author: (WoodyWoo) |
Date: 2020-10-01 14:56 |
Could I say the mutable sequence containing not the object but the pointer like C++.
So they can changed in def functions.
|
msg377764 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-10-01 15:34 |
I can't say how ElementTree works without more checking, but this solution cannot work in general. Given a pointer to an object that's in a list, how would you get to the next item? Say the parent list-like object has a C array of pointers to the objects it contains, and removing one of the objects re-shuffles that list. How would keeping a pointer to the current object help you?
In any event, I don't think this behavior is going to change.
|
msg377982 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2020-10-04 23:13 |
New changeset 40db798692ca783fc2163656f196ac77e8b9e792 by scoder in branch 'master':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/40db798692ca783fc2163656f196ac77e8b9e792
|
msg377984 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-10-04 23:23 |
New changeset 6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58 by Miss Skeleton (bot) in branch '3.8':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58
|
msg377989 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2020-10-04 23:57 |
New changeset d5719247ba32b3ac700454bad9a9e2cc7edeac6a by Miss Skeleton (bot) in branch '3.9':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464) (GH-22554)
https://github.com/python/cpython/commit/d5719247ba32b3ac700454bad9a9e2cc7edeac6a
|
msg377992 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-10-05 04:22 |
Closing since this works as designed.
Users are responsible for avoiding concurrent tree modifications during iteration.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:36 | admin | set | github: 86058 |
2020-10-05 04:22:13 | scoder | set | status: open -> closed resolution: not a bug messages:
+ msg377992
stage: patch review -> resolved |
2020-10-04 23:57:05 | ned.deily | set | messages:
+ msg377989 |
2020-10-04 23:23:51 | miss-islington | set | messages:
+ msg377984 |
2020-10-04 23:22:22 | miss-islington | set | pull_requests:
+ pull_request21552 |
2020-10-04 23:14:03 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request21551
|
2020-10-04 23:13:53 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg377982
|
2020-10-04 17:18:40 | scoder | set | pull_requests:
+ pull_request21547 |
2020-10-03 05:59:01 | scoder | set | versions:
+ Python 3.9, Python 3.10 |
2020-10-01 15:34:14 | eric.smith | set | messages:
+ msg377764 |
2020-10-01 14:56:12 | WoodyWoo | set | messages:
+ msg377756 |
2020-10-01 01:59:03 | WoodyWoo | set | messages:
+ msg377732 |
2020-09-30 23:27:22 | WoodyWoo | set | messages:
+ msg377724 |
2020-09-30 15:31:35 | scoder | set | stage: patch review |
2020-09-30 15:31:18 | scoder | set | messages:
+ msg377707 stage: patch review -> (no value) |
2020-09-30 15:27:46 | scoder | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request21489 |
2020-09-30 15:24:21 | eric.smith | set | messages:
+ msg377706 |
2020-09-30 15:22:21 | WoodyWoo | set | messages:
+ msg377705 |
2020-09-30 15:19:15 | WoodyWoo | set | messages:
+ msg377704 |
2020-09-30 15:13:08 | eric.smith | set | messages:
+ msg377703 |
2020-09-30 15:09:42 | scoder | set | messages:
+ msg377702 |
2020-09-30 15:01:05 | serhiy.storchaka | set | nosy:
+ eli.bendersky, scoder, docs@python
components:
+ Documentation assignee: docs@python |
2020-09-30 14:50:06 | eric.smith | set | messages:
+ msg377701 |
2020-09-30 14:43:11 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg377700
|
2020-09-30 14:31:44 | eric.smith | set | nosy:
+ eric.smith messages:
+ msg377699
|
2020-09-30 14:23:14 | WoodyWoo | set | title: use both "for" and "ElementTree.remove" has a index bug -> use both "for in" and "ElementTree.remove" has a index bug |
2020-09-30 14:20:25 | WoodyWoo | create | |