New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepcopying an xml.dom.minidom.Document generates an invalid XML document #54340
Comments
>>> doc.toxml()
u'<?xml version="1.0" ?><root/>'
>>> doc2.toxml()
u'<?xml version="1.0" ?><root/><root/>'
>>> minidom.parseString(doc2.toxml())
Traceback (most recent call last):
...
ExpatError: junk after document element: line 1, column 2 The workaround is to use doc.cloneNode(True). |
It works fine with 2.5 and 2.6. |
xml.dom.minicompat.NodeList provides __reduce__/reduce_ex methods, they return "state" as well as "list iterator". However one of those seems to be absolutely enough for reconstruction of instance (reduce/reduce_ex are inherited, methods __setstate/getstate aka state provider/consumer are defined in NodeList). deepcopy (actually its helper function _reconstruct) uses both, state and list iterator for copying, first it calls __setstate__(state) on new copy, then it goes through iterator and calls append(item) (as it is probably common for lists). This is it! (state and list iterator contain same information, list of items) Prior to some minor 2.7 update deepcopy used list iterator and state in opposite order, first it run append through iterator, then __setstate__ (this overwrote new content at all, no unwanted copies appeared). So the issue is there all the time, just with no impact in the past. Issue also occurs with simple copy.copy() on NodeList: >>> import copy
>>> from xml.dom import minidom
>>> doc = minidom.parseString('<root/>')
>>> print copy.copy(doc.childNodes)
[<DOM Element: root at 0xb73fbbcc>, <DOM Element: root at 0xb73fbbcc>] I am strongly convinced NodeList doesn't require __setstate__/getstate methods (even more I believe they are not desired in subclass of list). Therefore I am proposing patch with removal of these methods. If I am wrong in my final decision, somebody smarter has to find a solution. This is the reason why I described issue in deep. :) |
You can create object copy using both copy and pickle modules. pickle handles minidom.Document correctly, however copy does not. From this view point copy module needs to be adjusted. Here is the test code: import copy, pickle
from xml.dom.minidom import NodeList
obj = NodeList()
obj.append('a')
obj2 = copy.deepcopy(obj)
print(obj2)
obj2 = pickle.loads(pickle.dumps(obj))
print(obj2) Output (python 2.7-3.2): |
Code from msg146175 with attached patch applied: >>> import sys
>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
>>> import copy, pickle
>>> from xml.dom.minidom import NodeList
>>>
>>> obj = NodeList()
>>> obj.append('a')
>>>
>>> obj2 = copy.deepcopy(obj)
>>> print(obj2)
['a']
>>>
>>> obj2 = pickle.loads(pickle.dumps(obj))
>>> print(obj2)
['a'] Result is same in both cases. |
Marian, I have no doubts that the patch cures THIS issue. |
New changeset 0eea57ddb75f by Serhiy Storchaka in branch '2.7': New changeset aa304ad32292 by Serhiy Storchaka in branch '3.4': New changeset 5d6b2dc7e3d0 by Serhiy Storchaka in branch '3.5': New changeset 9fcfdb53e8af by Serhiy Storchaka in branch 'default': |
Thank you Marian for your patch. Committed modified patch with tests. For compatibility with data pickled by unpatched Python we have to keep __setstate__. The discrepancy between pickle and copy module is different not easy issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: