Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepcopying an xml.dom.minidom.Document generates an invalid XML document #54340

Closed
florentx mannequin opened this issue Oct 17, 2010 · 8 comments
Closed

deepcopying an xml.dom.minidom.Document generates an invalid XML document #54340

florentx mannequin opened this issue Oct 17, 2010 · 8 comments
Assignees
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@florentx
Copy link
Mannequin

florentx mannequin commented Oct 17, 2010

BPO 10131
Nosy @pitrou, @avassalotti, @florentx, @serhiy-storchaka
Files
  • mganisin-Issue10131-Python-2.7.2.patch: patch based on python-2.7.2
  • copying_NodeList.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-11-26.21:58:42.803>
    created_at = <Date 2010-10-17.13:42:19.514>
    labels = ['expert-XML', 'type-bug']
    title = 'deepcopying an xml.dom.minidom.Document generates an invalid XML document'
    updated_at = <Date 2015-11-26.21:58:42.802>
    user = 'https://github.com/florentx'

    bugs.python.org fields:

    activity = <Date 2015-11-26.21:58:42.802>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-11-26.21:58:42.803>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2010-10-17.13:42:19.514>
    creator = 'flox'
    dependencies = []
    files = ['22791', '41170']
    hgrepos = []
    issue_num = 10131
    keywords = ['patch']
    message_count = 8.0
    messages = ['118942', '118943', '141338', '146175', '148291', '148473', '255436', '255437']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'alexandre.vassalotti', 'flox', 'python-dev', 'Marian.Ganisin', 'genij.math', 'serhiy.storchaka', 'bkabrda']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue10131'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Oct 17, 2010

    >> import copy
    >> from xml.dom import minidom

    >> doc = minidom.parseString('<root/>')
    >> doc2 = copy.deepcopy(doc)

    >>> doc.toxml()
    u'<?xml version="1.0" ?><root/>'
    >>> doc2.toxml()
    u'<?xml version="1.0" ?><root/><root/>'
    >>> minidom.parseString(doc2.toxml())
    Traceback (most recent call last):
    ...
    ExpatError: junk after document element: line 1, column 2

    The workaround is to use doc.cloneNode(True).

    @florentx florentx mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Oct 17, 2010
    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Oct 17, 2010

    It works fine with 2.5 and 2.6.

    @MarianGanisin
    Copy link
    Mannequin

    MarianGanisin mannequin commented Jul 29, 2011

    xml.dom.minicompat.NodeList provides __reduce__/reduce_ex methods, they return "state" as well as "list iterator". However one of those seems to be absolutely enough for reconstruction of instance (reduce/reduce_ex are inherited, methods __setstate/getstate aka state provider/consumer are defined in NodeList).

    deepcopy (actually its helper function _reconstruct) uses both, state and list iterator for copying, first it calls __setstate__(state) on new copy, then it goes through iterator and calls append(item) (as it is probably common for lists). This is it! (state and list iterator contain same information, list of items)

    Prior to some minor 2.7 update deepcopy used list iterator and state in opposite order, first it run append through iterator, then __setstate__ (this overwrote new content at all, no unwanted copies appeared). So the issue is there all the time, just with no impact in the past.

    Issue also occurs with simple copy.copy() on NodeList:

    >>> import copy
    >>> from xml.dom import minidom
    >>> doc = minidom.parseString('<root/>')
    >>> print copy.copy(doc.childNodes)
    [<DOM Element: root at 0xb73fbbcc>, <DOM Element: root at 0xb73fbbcc>]

    I am strongly convinced NodeList doesn't require __setstate__/getstate methods (even more I believe they are not desired in subclass of list). Therefore I am proposing patch with removal of these methods.

    If I am wrong in my final decision, somebody smarter has to find a solution. This is the reason why I described issue in deep. :)

    @genijmath
    Copy link
    Mannequin

    genijmath mannequin commented Oct 22, 2011

    You can create object copy using both copy and pickle modules.
    I assume that identical results should be produced.

    pickle handles minidom.Document correctly, however copy does not.
    Even if patch to NodeList will be applied, copy or pickle modules need to be adjusted to produce identical results.

    From this view point copy module needs to be adjusted.

    Here is the test code:

    import copy, pickle
    from xml.dom.minidom import NodeList
    
    obj = NodeList()
    obj.append('a')
    
    obj2 = copy.deepcopy(obj)
    print(obj2)
    
    obj2 = pickle.loads(pickle.dumps(obj))
    print(obj2)

    Output (python 2.7-3.2):
    ['a', 'a']
    ['a']

    @MarianGanisin
    Copy link
    Mannequin

    MarianGanisin mannequin commented Nov 24, 2011

    Code from msg146175 with attached patch applied:

    >>> import sys
    >>> print(sys.version_info)
    sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
    >>> import copy, pickle
    >>> from xml.dom.minidom import NodeList
    >>> 
    >>> obj = NodeList()
    >>> obj.append('a')
    >>> 
    >>> obj2 = copy.deepcopy(obj)
    >>> print(obj2)
    ['a']
    >>> 
    >>> obj2 = pickle.loads(pickle.dumps(obj))
    >>> print(obj2)
    ['a']

    Result is same in both cases.

    @genijmath
    Copy link
    Mannequin

    genijmath mannequin commented Nov 28, 2011

    Marian, I have no doubts that the patch cures THIS issue.
    The problem is that it cures the wrong thing.
    It is _reconstruct function that needs to be fixed (somebody changed order of 'if' statements between 2.6 and 2.7 releases)

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 26, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 26, 2015

    New changeset 0eea57ddb75f by Serhiy Storchaka in branch '2.7':
    Issue bpo-10131: Fixed deep copying of minidom documents. Based on patch
    https://hg.python.org/cpython/rev/0eea57ddb75f

    New changeset aa304ad32292 by Serhiy Storchaka in branch '3.4':
    Issue bpo-10131: Fixed deep copying of minidom documents. Based on patch
    https://hg.python.org/cpython/rev/aa304ad32292

    New changeset 5d6b2dc7e3d0 by Serhiy Storchaka in branch '3.5':
    Issue bpo-10131: Fixed deep copying of minidom documents. Based on patch
    https://hg.python.org/cpython/rev/5d6b2dc7e3d0

    New changeset 9fcfdb53e8af by Serhiy Storchaka in branch 'default':
    Issue bpo-10131: Fixed deep copying of minidom documents. Based on patch
    https://hg.python.org/cpython/rev/9fcfdb53e8af

    @serhiy-storchaka
    Copy link
    Member

    Thank you Marian for your patch. Committed modified patch with tests. For compatibility with data pickled by unpatched Python we have to keep __setstate__.

    The discrepancy between pickle and copy module is different not easy issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant