classification
Title: deepcopying an xml.dom.minidom.Document generates an invalid XML document
Type: behavior Stage: resolved
Components: XML Versions: Python 3.6, Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Marian.Ganisin, alexandre.vassalotti, bkabrda, flox, genij.math, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2010-10-17 13:42 by flox, last changed 2015-11-26 21:58 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
mganisin-Issue10131-Python-2.7.2.patch Marian.Ganisin, 2011-07-29 09:08 patch based on python-2.7.2
copying_NodeList.patch serhiy.storchaka, 2015-11-26 20:42 review
Messages (8)
msg118942 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-10-17 13:42
>>> import copy
>>> from xml.dom import minidom

>>> doc = minidom.parseString('<root/>')
>>> doc2 = copy.deepcopy(doc)

>>> doc.toxml()
u'<?xml version="1.0" ?><root/>'
>>> doc2.toxml()
u'<?xml version="1.0" ?><root/><root/>'
>>> minidom.parseString(doc2.toxml())
Traceback (most recent call last):
...
ExpatError: junk after document element: line 1, column 2


The workaround is to use doc.cloneNode(True).
msg118943 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-10-17 13:47
It works fine with 2.5 and 2.6.
msg141338 - (view) Author: Marian Ganisin (Marian.Ganisin) Date: 2011-07-29 09:08
xml.dom.minicompat.NodeList provides __reduce__/__reduce_ex__ methods, they return "state" as well as "list iterator". However one of those seems to be absolutely enough for reconstruction of instance (__reduce__/__reduce_ex__ are inherited, methods __setstate/__getstate__ aka state provider/consumer are defined in NodeList).

deepcopy (actually its helper function _reconstruct) uses both, state and list iterator for copying, first it calls __setstate__(state) on new copy, then it goes through iterator and calls append(item) (as it is probably common for lists). This is it! (state and list iterator contain same information, list of items)

Prior to some minor 2.7 update deepcopy used list iterator and state in opposite order, first it run append through iterator, then __setstate__ (this overwrote new content at all, no unwanted copies appeared). So the issue is there all the time, just with no impact in the past.

Issue also occurs with simple copy.copy() on NodeList:

>>> import copy
>>> from xml.dom import minidom
>>> doc = minidom.parseString('<root/>')
>>> print copy.copy(doc.childNodes)
[<DOM Element: root at 0xb73fbbcc>, <DOM Element: root at 0xb73fbbcc>]

I am strongly convinced NodeList doesn't require __setstate__/__getstate__ methods (even more I believe they are not desired in subclass of list). Therefore I am proposing patch with removal of these methods.

If I am wrong in my final decision, somebody smarter has to find a solution. This is the reason why I described issue in deep. :)
msg146175 - (view) Author: Yevgen Yampolskiy (genij.math) Date: 2011-10-22 17:29
You can create object copy using both copy and pickle modules. 
I assume that identical results should be produced.

pickle handles minidom.Document correctly, however copy does not. 
Even if patch to NodeList will be applied, copy or pickle modules need to be adjusted to produce identical results.

From this view point copy module needs to be adjusted.

Here is the test code:

import copy, pickle
from xml.dom.minidom import NodeList

obj = NodeList()
obj.append('a')

obj2 = copy.deepcopy(obj)
print(obj2)

obj2 = pickle.loads(pickle.dumps(obj))
print(obj2)

Output (python 2.7-3.2):
['a', 'a']
['a']
msg148291 - (view) Author: Marian Ganisin (Marian.Ganisin) Date: 2011-11-24 22:55
Code from msg146175 with attached patch applied:

>>> import sys
>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
>>> import copy, pickle
>>> from xml.dom.minidom import NodeList
>>> 
>>> obj = NodeList()
>>> obj.append('a')
>>> 
>>> obj2 = copy.deepcopy(obj)
>>> print(obj2)
['a']
>>> 
>>> obj2 = pickle.loads(pickle.dumps(obj))
>>> print(obj2)
['a']

Result is same in both cases.
msg148473 - (view) Author: Yevgen Yampolskiy (genij.math) Date: 2011-11-28 10:56
Marian, I have no doubts that the patch cures THIS issue. 
The problem is that it cures the wrong thing.
It is _reconstruct function that needs to be fixed (somebody changed order of 'if' statements between 2.6 and 2.7 releases)
msg255436 - (view) Author: Roundup Robot (python-dev) Date: 2015-11-26 21:53
New changeset 0eea57ddb75f by Serhiy Storchaka in branch '2.7':
Issue #10131: Fixed deep copying of minidom documents.  Based on patch
https://hg.python.org/cpython/rev/0eea57ddb75f

New changeset aa304ad32292 by Serhiy Storchaka in branch '3.4':
Issue #10131: Fixed deep copying of minidom documents.  Based on patch
https://hg.python.org/cpython/rev/aa304ad32292

New changeset 5d6b2dc7e3d0 by Serhiy Storchaka in branch '3.5':
Issue #10131: Fixed deep copying of minidom documents.  Based on patch
https://hg.python.org/cpython/rev/5d6b2dc7e3d0

New changeset 9fcfdb53e8af by Serhiy Storchaka in branch 'default':
Issue #10131: Fixed deep copying of minidom documents.  Based on patch
https://hg.python.org/cpython/rev/9fcfdb53e8af
msg255437 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-26 21:56
Thank you Marian for your patch. Committed modified patch with tests. For compatibility with data pickled by unpatched Python we have to keep __setstate__.

The discrepancy between pickle and copy module is different not easy issue.
History
Date User Action Args
2015-11-26 21:58:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: test needed -> resolved
2015-11-26 21:56:25serhiy.storchakasetmessages: + msg255437
2015-11-26 21:53:04python-devsetnosy: + python-dev
messages: + msg255436
2015-11-26 20:42:22serhiy.storchakasetfiles: + copying_NodeList.patch
assignee: serhiy.storchaka

nosy: + pitrou, alexandre.vassalotti, serhiy.storchaka
versions: + Python 3.4, Python 3.5, Python 3.6, - Python 3.2, Python 3.3
2013-07-24 11:56:48bkabrdasetnosy: + bkabrda
2012-03-23 12:42:21eli.benderskysetnosy: - eli.bendersky
2012-02-24 10:52:07ezio.melottisetnosy: + eli.bendersky
stage: needs patch -> test needed

versions: + Python 3.3, - Python 3.1
2011-11-28 10:56:46genij.mathsetmessages: + msg148473
2011-11-24 22:55:51Marian.Ganisinsetmessages: + msg148291
2011-10-22 17:29:14genij.mathsetnosy: + genij.math
messages: + msg146175
2011-07-29 09:08:21Marian.Ganisinsetfiles: + mganisin-Issue10131-Python-2.7.2.patch

nosy: + Marian.Ganisin
messages: + msg141338

keywords: + patch
2010-10-17 13:47:10floxsetmessages: + msg118943
2010-10-17 13:42:19floxcreate