Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize copy methods between Python and C implementations of xml.etree.ElementTree.Element #76605

Closed
GPHemsley mannequin opened this issue Dec 24, 2017 · 31 comments
Closed
Labels
3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@GPHemsley
Copy link
Mannequin

GPHemsley mannequin commented Dec 24, 2017

BPO 32424
Nosy @scoder, @bitdancer, @asvetlov, @serhiy-storchaka, @srinivasreddy, @JulienPalard, @pganssle, @GPHemsley
PRs
  • bpo-32424: Rename copy() to __copy__() in Python version of xml.etree.ElementTree. #5022
  • bpo-32424: Synchronize xml.etree.ElementTree.Element copy methods. #5046
  • bpo-32424: Improve test coverage for xml.etree.ElementTree #12891
  • bpo-32424: Deprecate xml.etree.ElementTree.Element.copy() in favor of copy.copy() #12995
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-09-10.15:22:43.606>
    created_at = <Date 2017-12-24.18:51:40.307>
    labels = ['expert-XML', '3.8', 'type-bug', 'library', '3.9']
    title = 'Synchronize copy methods between Python and C implementations of xml.etree.ElementTree.Element'
    updated_at = <Date 2019-09-10.15:22:43.606>
    user = 'https://github.com/GPHemsley'

    bugs.python.org fields:

    activity = <Date 2019-09-10.15:22:43.606>
    actor = 'scoder'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-09-10.15:22:43.606>
    closer = 'scoder'
    components = ['Library (Lib)', 'XML']
    creation = <Date 2017-12-24.18:51:40.307>
    creator = 'gphemsley'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32424
    keywords = ['patch']
    message_count = 30.0
    messages = ['309010', '309074', '309089', '309116', '309121', '309122', '309124', '309131', '309140', '309142', '309143', '309144', '309145', '309146', '309147', '309149', '309179', '309184', '309263', '309953', '310362', '340381', '340589', '340592', '340600', '340602', '341021', '341447', '351698', '351699']
    nosy_count = 9.0
    nosy_names = ['scoder', 'r.david.murray', 'eli.bendersky', 'asvetlov', 'serhiy.storchaka', 'thatiparthy', 'mdk', 'p-ganssle', 'gphemsley']
    pr_nums = ['5022', '5046', '12891', '12995']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32424'
    versions = ['Python 3.8', 'Python 3.9']

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Dec 24, 2017

    Currently, the Python implementation of the Element class in xml.etree.ElementTree defines a method called copy() which the C implementation does not define, whereas the C implementation defines a __copy__() method (and a __deepcopy__() method) which the Python implementation does not define.

    Given that the documentation makes no mention of a copy() method and that its definition would be masked by a standard import of xml.etree.ElementTree, I propose that it be renamed to __copy__() so that copy.copy() can make use of it.

    @GPHemsley GPHemsley mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error labels Dec 24, 2017
    @JulienPalard
    Copy link
    Member

    As the C implementation is already shadowing the Python implementation when available (last lines of Lib/xml/etree/ElementTree.py), using .copy() is already a bug magnet (works when the Python implem is used, does not work when shadowed by the C implem).

    So renaming the copy method to __copy__ in the python implementation looks good to me.

    Ultimately the Python implementation would require a __deepcopy__ too?

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Dec 27, 2017

    Ultimately, yeah, the Python version should probably define __deepcopy__ as well. But since this is just a rename of an existing method, I figure we can defer that to another time.

    @srinivasreddy
    Copy link
    Mannequin

    srinivasreddy mannequin commented Dec 28, 2017

    Renaming the method copy() to __copy__ breaks the API. I would rather have an alias for __copy__.

    I agree that it is not documented, but some users tend to assume public methods are documented and use them.

    So i think it is better to not to break their code.

    @asvetlov
    Copy link
    Contributor

    Agree with Srinivas

    @JulienPalard
    Copy link
    Member

    I don't agree with the API breakage, the C implementation of the module does not expose copy(), so code using copy() is already broken (since 2005) in cases the C implementation is used.

    I also expect the C implementation to be used almost anywhere, but I may be wrong on this point: in which case the Python implementation may still be used?

    @asvetlov
    Copy link
    Contributor

    pypy for example has Element.copy()

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Dec 28, 2017

    Two notes:

    That said, I'm neutral on the subject. I'm happy to implement whichever option you decide on.

    @asvetlov
    Copy link
    Contributor

    I mean dropping .copy() breaks not only compatibility between pure python versions but also between pypy releases and pypy/CPython code.
    That's why I suggest to keep .copy() as an alias for __copy__().

    @pganssle
    Copy link
    Member

    I mean dropping .copy() breaks not only compatibility between pure python versions but also between pypy releases and pypy/CPython code.
    That's why I suggest to keep .copy() as an alias for __copy__().

    If the main issue is that pypy users have been using this, can't pypy just keep an alias for copy around, possibly with a DeprecationWarning?

    @asvetlov
    Copy link
    Contributor

    For derivative Python implementation there is a standard strategy: they uses a copy of CPython standard library.

    Tthat's why any new module should have a pure Python implementation and that's why https://www.python.org/dev/peps/pep-0399/ exists.

    Jython has copy() method as well. IronPython has it. I doubt if an implementation without xml.etree.ElementTree.Element.copy exists at all.

    The method removal breaks backward compatibility without a reason, __copy__ alias is valuable and backward compatible change.

    @pganssle
    Copy link
    Member

    @andrew Are you suggesting that a copy method be added to the C implementation, as an alias to copy? Because it makes no sense to keep the pure Python and C implementations out of sync just so that it's easier for projects forking the pure Python implementation to maintain backwards compatibility with their earlier implementations.

    @asvetlov
    Copy link
    Contributor

    Yes.
    For designing from scratch copy() is not necessary but if we have it for decade -- better to unify pure Python and C implementation by adding missing methods.

    @bitdancer
    Copy link
    Member

    Paul Ganssle: Even if Andrew were not suggesting adding copy to the C implementation (I have no opinion on that currently), it would still be correct to maintain backward compatibility in the python version in the standard library. We do not consider the standard library to be CPython specific, even though parts of it are. There is no "fork" involved in other projects using it, from our point of view. Quite the opposite, where possible. We have given commit rights to developers from other python implementations specifically so they can contribute things that improve compatibility in the standard library and standard library test suite for those other python implementations.

    @pganssle
    Copy link
    Member

    There is no "fork" involved in other projects using it, from our point of view.

    I did not mean "fork" in some judgmental sense, I'm mostly just familiar with pypy, but I was under the impression that other projects were *literally* forking the standard library as a practical matter. I get the impression that other interpreters maintain a patched interpreter often for the purposes of adding fixes like the one pointed out in msg309131 which bring the behavior of the Python implementation of the standard library into line with the C implementations, since the C version is the de facto standard (since most code is written and tested only for CPython).

    I'd be fine with adding a C alias for copy, though as a purely practical matter, I strongly suspect that dropping copy from the pure Python implementation would not be a huge deal, but if that breaks the contract it breaks the contract.

    @bitdancer
    Copy link
    Member

    Yes, that's why I said "from our point of view" :) I know there is usually a fork in the practical sense, but we want to make it as easy as practical to sync that fork, which includes not breaking things in the python versions without good reason.

    @GPHemsley GPHemsley mannequin changed the title Rename copy() to __copy__() in xml.etree.ElementTree.Element Python implementation Synchronize copy methods between Python and C implementations of xml.etree.ElementTree.Element Dec 29, 2017
    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Dec 29, 2017

    Given the discussion, I've gone ahead and created a new PR that synchronizes the three copy methods between implementations and deprecates Element.copy().

    @serhiy-storchaka
    Copy link
    Member

    There is an existing test for copying. No new tests are needed. Just add a case for the copy() method in test_copy.

    Special __deepcopy__() method for Python implementation is not needed. copy.deepcopy() already works well.

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Dec 31, 2017

    As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.

    And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.

    PR 5046 has changes that I believe are ready for review.

    @serhiy-storchaka
    Copy link
    Member

    As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.

    Python implementation shouldn't copy all implementation details of the C implementation. This is cumbersome and often impossible. Both implementation should have the same behavior, and they do have.

    And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.

    What are these discrepancies?

    In any case it is better to extend existing tests by adding missed checks than duplicate tests.

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Jan 20, 2018

    > As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.

    Python implementation shouldn't copy all implementation details of the C implementation. This is cumbersome and often impossible. Both implementation should have the same behavior, and they do have.

    As it stands, calling Element.__deepcopy__() will succeed with the C
    implementation and fail with the Python implementation. That seems
    problematic to me, and making that *not* be the case is trivial, so I
    don't understand the generalization to "all implementation details".

    Additionally, if copy.deepcopy() already works well with the Python
    implementation, why would it not work on the C implementation? What is
    different about the C implementation that requires the definition of
    __deepcopy__() in the first place?

    > And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.

    What are these discrepancies?

    As I mentioned in the PR, the C implementation does not make use of
    Element.__init__() when it creates new instances of the class, opting
    instead to use create_new_element() (which is not used in the C
    implementation of Element.__init__()). However, create_new_element()
    does not make a copy of the attrib dict that is passed in, meaning that
    the internal attrib dict of an Element instance is mutable by changing
    the dict that was passed in.

    I have modified the C implementation to fix this problem by adding a
    copy line in create_new_element(). Without this change, some of the new
    tests in the PR will fail in the C implementation.

    The Python implementation does not have this problem because it
    inherently always goes through Element.__init__().

    In any case it is better to extend existing tests by adding missed checks than duplicate tests.

    As I mentioned in the PR:

    I have added the unit tests in the way that I have because the existing
    checks are generally integration tests, and most of the test coverage
    for this module comes as a side effect of pickle tests. It is my goal to
    expand the unit tests later to test each method individually, so
    starting that work here makes it easier.

    @serhiy-storchaka
    Copy link
    Member

    As it stands, calling Element.__deepcopy__() will succeed with the C
    implementation and fail with the Python implementation.

    You should not call it directly. Use copy.deepcopy().

    What is
    different about the C implementation that requires the definition of
    __deepcopy__() in the first place?

    __deepcopy__() in the C implementation is merely optimization. It is less efficient to convert from the efficient internal representation of Element object to tuples and dicts and back.

    However, create_new_element()
    does not make a copy of the attrib dict that is passed in, meaning that
    the internal attrib dict of an Element instance is mutable by changing
    the dict that was passed in.

    create_new_element() usually is called with a new copy of the attrib dict. Making yet one copy inside it is just a waste of time. If in some cases it is called with an externally referred dict, fix these cases by making a copy before calling create_new_element().

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Apr 21, 2019

    Taking a step back, I want to make sure I understand the action items for resolving this:

    • Add copy() as an alias to __copy__() in the C implementation to match the Python implementation.
    • Deprecate copy() in favor of copy.copy() in both the Python and C implementations, bypassing a PendingDeprecation step because the implementations have been out of sync for so long as it is.

    Any other issues discussed here are orthogonal to the goal of this ticket.

    Are we in agreement on this?

    @GPHemsley GPHemsley mannequin added 3.8 only security fixes 3.9 only security fixes labels Apr 21, 2019
    @GPHemsley GPHemsley mannequin removed the 3.7 (EOL) end of life label Apr 21, 2019
    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented Apr 21, 2019

    Opened bpo-36685 for discussion of the attrib copy issue.

    @scoder
    Copy link
    Contributor

    scoder commented Apr 21, 2019

    We should not add anything to the implementation that we consider legacy elsewhere. Py3 has "always" used the C accelerator module instead of the Python implementation, which suggests that its interface is probably the righter one.

    So: make sure that the Obvious Ways, copy.copy() and copy.deepcopy(), work nicely for both implementations, and deprecate Element.copy() in Py3.8.

    @serhiy-storchaka
    Copy link
    Member

    Concur with Stefan. Adding the method deprecated from the born looks silly. We should either deprecate the copy() method in the Python implementation, or add an undeprecated method in the C implementation. The latter should be considered as a new feature.

    @scoder
    Copy link
    Contributor

    scoder commented Apr 28, 2019

    New changeset 50fed0b by Stefan Behnel (Gordon P. Hemsley) in branch 'master':
    bpo-32424: Improve test coverage for xml.etree.ElementTree (GH-12891)
    50fed0b

    @GPHemsley
    Copy link
    Mannequin Author

    GPHemsley mannequin commented May 5, 2019

    It seems the final open question on this is whether the mismatch between the Python and C implementations is enough to bypass PendingDeprecationWarning for copy() in favor of jumping straight to DeprecationWarning.

    @scoder
    Copy link
    Contributor

    scoder commented Sep 10, 2019

    Let's just deprecate it directly (in Py3.9, I guess), given that it was never documented.

    @scoder
    Copy link
    Contributor

    scoder commented Sep 10, 2019

    New changeset 7d952de by Stefan Behnel (Gordon P. Hemsley) in branch 'master':
    bpo-32424: Deprecate xml.etree.ElementTree.Element.copy() in favor of copy.copy() (GH-12995)
    7d952de

    @vstinner
    Copy link
    Member

    Follow-up: issue #94383: Remove xml.etree.ElementTree.Element.copy() (Python implementation): use copy.copy() instead.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants