classification
Title: Synchronize copy methods between Python and C implementations of xml.etree.ElementTree.Element
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, eli.bendersky, gphemsley, mdk, p-ganssle, r.david.murray, scoder, serhiy.storchaka, thatiparthy
Priority: normal Keywords: patch

Created on 2017-12-24 18:51 by gphemsley, last changed 2019-09-10 15:22 by scoder. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5022 closed gphemsley, 2017-12-27 15:20
PR 5046 closed gphemsley, 2017-12-29 16:25
PR 12891 merged gphemsley, 2019-04-21 16:44
PR 12995 merged gphemsley, 2019-04-28 14:38
Messages (30)
msg309010 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-24 18:51
Currently, the Python implementation of the Element class in xml.etree.ElementTree defines a method called copy() which the C implementation does not define, whereas the C implementation defines a __copy__() method (and a __deepcopy__() method) which the Python implementation does not define.

Given that the documentation makes no mention of a copy() method and that its definition would be masked by a standard import of xml.etree.ElementTree, I propose that it be renamed to __copy__() so that copy.copy() can make use of it.
msg309074 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2017-12-26 23:54
As the C implementation is already shadowing the Python implementation when available (last lines of Lib/xml/etree/ElementTree.py), using `.copy()` is already a bug magnet (works when the Python implem is used, does not work when shadowed by the C implem).

So renaming the copy method to __copy__ in the python implementation looks good to me.

Ultimately the Python implementation would require a __deepcopy__ too?
msg309089 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-27 15:23
Ultimately, yeah, the Python version should probably define __deepcopy__ as well. But since this is just a rename of an existing method, I figure we can defer that to another time.
msg309116 - (view) Author: Srinivas Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) (thatiparthy) * Date: 2017-12-28 07:13
Renaming the method `copy()` to `__copy__` breaks the API. I would rather have an alias for `__copy__`. 

I agree that it is not documented, but some users tend to assume public methods are documented and use them. 

So i think it is better to not to break their code.
msg309121 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-28 11:12
Agree with Srinivas
msg309122 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2017-12-28 11:19
I don't agree with the API breakage, the C implementation of the module does not expose copy(), so code using copy() is already broken (since 2005) in cases the C implementation is used.

I also expect the C implementation to be used almost anywhere, but I may be wrong on this point: in which case the Python implementation may still be used?
msg309124 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-28 11:38
pypy for example has Element.copy()
msg309131 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-28 15:07
Two notes:

* It appears that pypy is based on no more recent than Python 3.5, so this wouldn't immediately break them. (3.6 support is maybe in development?)
* pypy appears to have already made other adjustments due to the differences between the Python and C implementations: https://bitbucket.org/pypy/pypy/diff/lib-python/3/xml/etree/ElementTree.py?diff2=0939e3a8a08d&at=py3.5

That said, I'm neutral on the subject. I'm happy to implement whichever option you decide on.
msg309140 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-28 19:42
I mean dropping `.copy()` breaks not only compatibility between pure python versions but also between pypy releases and pypy/CPython code.
That's why I suggest to keep `.copy()` as an alias for `__copy__()`.
msg309142 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2017-12-28 21:48
> I mean dropping `.copy()` breaks not only compatibility between pure python versions but also between pypy releases and pypy/CPython code.
That's why I suggest to keep `.copy()` as an alias for `__copy__()`.

If the main issue is that pypy users have been using this, can't pypy just keep an alias for `copy` around, possibly with a DeprecationWarning?
msg309143 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-28 22:16
For derivative Python implementation there is a standard strategy: they uses a copy of CPython standard library.

Tthat's why any new module should have a pure Python implementation and that's why https://www.python.org/dev/peps/pep-0399/ exists.

Jython has copy() method as well. IronPython has it. I doubt if an implementation without xml.etree.ElementTree.Element.copy exists at all.

The method removal breaks backward compatibility without a reason, __copy__ alias is valuable and backward compatible change.
msg309144 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2017-12-28 22:22
@Andrew Are you suggesting that a `copy` method be *added* to the C implementation, as an alias to __copy__? Because it makes no sense to keep the pure Python and C implementations out of sync just so that it's easier for projects forking the pure Python implementation to maintain backwards compatibility with their earlier implementations.
msg309145 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-28 22:25
Yes.
For designing from scratch copy() is not necessary but if we have it for decade -- better to unify pure Python and C implementation by adding missing methods.
msg309146 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-28 22:48
Paul Ganssle: Even if Andrew were not suggesting adding copy to the C implementation (I have no opinion on that currently), it would still be correct to maintain backward compatibility in the python version in the standard library.  We do not consider the standard library to be CPython specific, even though parts of it are.  There is no "fork" involved in other projects using it, from our point of view.  Quite the opposite, where possible.  We have given commit rights to developers from other python implementations specifically so they can contribute things that improve compatibility in the standard library and standard library test suite for those other python implementations.
msg309147 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2017-12-28 23:01
> There is no "fork" involved in other projects using it, from our point of view.

I did not mean "fork" in some judgmental sense, I'm mostly just familiar with pypy, but I was under the impression that other projects were *literally* forking the standard library as a practical matter. I get the impression that other interpreters maintain a patched interpreter often for the purposes of adding fixes like the one pointed out in msg309131 which bring the behavior of the Python implementation of the standard library into line with the C implementations, since the C version is the de facto standard (since most code is written and tested only for CPython).

I'd be fine with adding a C alias for `copy`, though as a purely practical matter, I strongly suspect that dropping `copy` from the pure Python implementation would not be a huge deal, but if that breaks the contract it breaks the contract.
msg309149 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-28 23:29
Yes, that's why I said "from our point of view" :)  I know there is usually a fork in the practical sense, but we want to make it as easy as practical to sync that fork, which includes not breaking things in the python versions without good reason.
msg309179 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-29 16:29
Given the discussion, I've gone ahead and created a new PR that synchronizes the three copy methods between implementations and deprecates Element.copy().
msg309184 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-29 18:03
There is an existing test for copying. No new tests are needed. Just add a case for the copy() method in test_copy.

Special __deepcopy__() method for Python implementation is not needed. copy.deepcopy() already works well.
msg309263 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2017-12-31 04:21
As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.

And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.

PR 5046 has changes that I believe are ready for review.
msg309953 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-15 08:12
> As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.

Python implementation shouldn't copy all implementation details of the C implementation. This is cumbersome and often impossible. Both implementation should have the same behavior, and they do have.

> And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.

What are these discrepancies?

In any case it is better to extend existing tests by adding missed checks than duplicate tests.
msg310362 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2018-01-20 22:20
>> As discussed above, starting with msg309074, __deepcopy__() is being added to the Python implementation because it already exists in the C implementation.
> 
> Python implementation shouldn't copy all implementation details of the C implementation. This is cumbersome and often impossible. Both implementation should have the same behavior, and they do have.

As it stands, calling Element.__deepcopy__() will succeed with the C
implementation and fail with the Python implementation. That seems
problematic to me, and making that *not* be the case is trivial, so I
don't understand the generalization to "all implementation details".

Additionally, if copy.deepcopy() already works well with the Python
implementation, why would it not work on the C implementation? What is
different about the C implementation that requires the definition of
__deepcopy__() in the first place?

>> And additional tests have in fact uncovered further discrepancies between the Python and C implementations with regard to copying behavior.
> 
> What are these discrepancies?

As I mentioned in the PR, the C implementation does not make use of
Element.__init__() when it creates new instances of the class, opting
instead to use create_new_element() (which is not used in the C
implementation of Element.__init__()). However, create_new_element()
does not make a copy of the attrib dict that is passed in, meaning that
the internal attrib dict of an Element instance is mutable by changing
the dict that was passed in.

I have modified the C implementation to fix this problem by adding a
copy line in create_new_element(). Without this change, some of the new
tests in the PR will fail in the C implementation.

The Python implementation does not have this problem because it
inherently always goes through Element.__init__().

> In any case it is better to extend existing tests by adding missed checks than duplicate tests.

As I mentioned in the PR:

I have added the unit tests in the way that I have because the existing
checks are generally integration tests, and most of the test coverage
for this module comes as a side effect of pickle tests. It is my goal to
expand the unit tests later to test each method individually, so
starting that work here makes it easier.
msg340381 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-17 06:50
> As it stands, calling Element.__deepcopy__() will succeed with the C
> implementation and fail with the Python implementation.

You should not call it directly. Use copy.deepcopy().

> What is
> different about the C implementation that requires the definition of
> __deepcopy__() in the first place?

__deepcopy__() in the C implementation is merely optimization. It is less efficient to convert from the efficient internal representation of Element object to tuples and dicts and back.

> However, create_new_element()
> does not make a copy of the attrib dict that is passed in, meaning that
> the internal attrib dict of an Element instance is mutable by changing
> the dict that was passed in.

create_new_element() usually is called with a new copy of the attrib dict. Making yet one copy inside it is just a waste of time. If in some cases it is called with an externally referred dict, fix these cases by making a copy before calling create_new_element().
msg340589 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2019-04-21 00:00
Taking a step back, I want to make sure I understand the action items for resolving this:
* Add copy() as an alias to __copy__() in the C implementation to match the Python implementation.
* Deprecate copy() in favor of copy.copy() in both the Python and C implementations, bypassing a PendingDeprecation step because the implementations have been out of sync for so long as it is.

Any other issues discussed here are orthogonal to the goal of this ticket.

Are we in agreement on this?
msg340592 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2019-04-21 00:09
Opened issue36685 for discussion of the attrib copy issue.
msg340600 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-21 06:36
We should not add anything to the implementation that we consider legacy elsewhere. Py3 has "always" used the C accelerator module instead of the Python implementation, which suggests that its interface is probably the righter one.

So: make sure that the Obvious Ways, copy.copy() and copy.deepcopy(), work nicely for both implementations, and deprecate Element.copy() in Py3.8.
msg340602 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-21 07:29
Concur with Stefan. Adding the method deprecated from the born looks silly. We should either deprecate the copy() method in the Python implementation, or add an undeprecated method in the C implementation. The latter should be considered as a new feature.
msg341021 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-28 04:41
New changeset 50fed0b64faa3053300008ef5607b570fe209de6 by Stefan Behnel (Gordon P. Hemsley) in branch 'master':
bpo-32424: Improve test coverage for xml.etree.ElementTree (GH-12891)
https://github.com/python/cpython/commit/50fed0b64faa3053300008ef5607b570fe209de6
msg341447 - (view) Author: Gordon P. Hemsley (gphemsley) * Date: 2019-05-05 13:55
It seems the final open question on this is whether the mismatch between the Python and C implementations is enough to bypass PendingDeprecationWarning for copy() in favor of jumping straight to DeprecationWarning.
msg351698 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-09-10 15:21
Let's just deprecate it directly (in Py3.9, I guess), given that it was never documented.
msg351699 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-09-10 15:22
New changeset 7d952ded6813c896ea3f4234bb8db5247dcb5484 by Stefan Behnel (Gordon P. Hemsley) in branch 'master':
bpo-32424: Deprecate xml.etree.ElementTree.Element.copy() in favor of copy.copy() (GH-12995)
https://github.com/python/cpython/commit/7d952ded6813c896ea3f4234bb8db5247dcb5484
History
Date User Action Args
2019-09-10 15:22:43scodersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-09-10 15:22:05scodersetmessages: + msg351699
2019-09-10 15:21:52scodersetmessages: + msg351698
2019-05-05 13:55:56gphemsleysetmessages: + msg341447
2019-04-28 14:38:55gphemsleysetpull_requests: + pull_request12918
2019-04-28 04:41:48scodersetmessages: + msg341021
2019-04-21 16:44:06gphemsleysetpull_requests: + pull_request12822
2019-04-21 07:29:50serhiy.storchakasetmessages: + msg340602
2019-04-21 06:36:20scodersetmessages: + msg340600
2019-04-21 00:09:57gphemsleysetmessages: + msg340592
2019-04-21 00:00:12gphemsleysetmessages: + msg340589
versions: + Python 3.8, Python 3.9, - Python 3.7
2019-04-17 06:50:17serhiy.storchakasetmessages: + msg340381
2018-01-20 22:20:29gphemsleysetmessages: + msg310362
2018-01-15 08:12:30serhiy.storchakasetmessages: + msg309953
2017-12-31 04:21:07gphemsleysetmessages: + msg309263
2017-12-29 18:03:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg309184
2017-12-29 16:29:24gphemsleysetmessages: + msg309179
2017-12-29 16:26:26gphemsleysettitle: Rename copy() to __copy__() in xml.etree.ElementTree.Element Python implementation -> Synchronize copy methods between Python and C implementations of xml.etree.ElementTree.Element
2017-12-29 16:25:32gphemsleysetpull_requests: + pull_request4927
2017-12-28 23:29:43r.david.murraysetmessages: + msg309149
2017-12-28 23:01:32p-gansslesetmessages: + msg309147
2017-12-28 22:48:38r.david.murraysetnosy: + r.david.murray
messages: + msg309146
2017-12-28 22:25:52asvetlovsetmessages: + msg309145
2017-12-28 22:22:10p-gansslesetmessages: + msg309144
2017-12-28 22:16:35asvetlovsetmessages: + msg309143
2017-12-28 21:48:48p-gansslesetnosy: + p-ganssle
messages: + msg309142
2017-12-28 19:42:41asvetlovsetmessages: + msg309140
2017-12-28 15:07:42gphemsleysetmessages: + msg309131
2017-12-28 11:38:40asvetlovsetmessages: + msg309124
2017-12-28 11:19:20mdksetmessages: + msg309122
2017-12-28 11:12:01asvetlovsetnosy: + asvetlov
messages: + msg309121
2017-12-28 07:13:26thatiparthysetnosy: + thatiparthy
messages: + msg309116
2017-12-27 15:23:11gphemsleysetmessages: + msg309089
2017-12-27 15:20:22gphemsleysetkeywords: + patch
stage: patch review
pull_requests: + pull_request4909
2017-12-26 23:54:54mdksetnosy: + mdk
messages: + msg309074
2017-12-24 18:51:40gphemsleycreate