classification
Title: Integrate ElementC14N module into xml.etree package
Type: enhancement Stage: resolved
Components: Library (Lib), XML Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: scoder Nosy List: ZackerySpytz, cbz, christian.heimes, effbot, eli.bendersky, flox, loewis, scoder, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2011-12-16 08:17 by scoder, last changed 2019-05-02 20:12 by scoder. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12966 merged scoder, 2019-04-26 09:03
PR 13053 merged scoder, 2019-05-02 07:51
PR 13055 merged scoder, 2019-05-02 14:43
Messages (17)
msg149598 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2011-12-16 08:17
The ElementC14N.py module by Fredrik Lundh implements XML canonicalisation for the ElementTree serialiser. Given that the required API hooks to use it are already in xml.etree.ElementTree, this would make a nice, simple and straight forward addition to the existing xml.etree package.

The source can be found here (unchanged since at least 2009):

https://bitbucket.org/effbot/et-2009-provolone/src/tip/elementtree/elementtree/ElementC14N.py

Note that the source needs some minor modifications to use relative imports at the top. Also, the "2.3 compatibility" code section can be dropped.
msg149633 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-12-16 17:52
Code added to the standard library should be contributed by its author, with an explicit statement of plans to support it in an ongoing manner, and preferably also with plans to stop providing standalone releases over time.
msg227606 - (view) Author: Chris E (cbz) Date: 2014-09-26 11:09
Whilst in most cases this would be correct, in this case it looks like the original contributor took a subset of what the original author wrote and put it into the python libraries.

Until relatively recently the ElementTree.py file included a stanza that attempted to import the ElementC14N module and conditionally set up the 'c14n' key value in _serialize
msg330002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-16 14:31
"c14n" is documented as an accepted serialization method of write() and there is some (non-working) code for support of C14N. e6a951b83e30b3b9c809a441041fb0f21f72b168 removed optional import of ElementC14N.
msg338796 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-25 12:14
References:

Canonical XML Version 2.0 -- https://www.w3.org/TR/xml-c14n2/
Test cases for Canonical XML 2.0 -- https://www.w3.org/TR/xml-c14n2-testcases/
msg340895 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-26 09:00
Turns out, it was not that easy. :-/

ElementTree lacks prefixes in its tree model, so they would have to be either registered globally (via register_namespace()) or come from the parser. I tried the latter since that is the most generic way when the input is serialised already. See issue 36673 and issue 36676 for extensions to the parser target interface that this implementation relies on. Note that this is a new implementation, only marginally based off the original ElementC14N implementation.

I only implemented C14N 2.0 (which lxml also does not have, but I'll add it there). I got most of the official test cases working, including prefix rewriting and prefix resolution in tag and attribute content.

https://www.w3.org/TR/xml-c14n2-testcases/

What's not supported?

The original namespace prefixes may not be preserved when namespaces are declared with multiple prefixes. In that case, one of them is picked. That's difficult to implement in ET because the parser resolves and discards prefixes. I think that's acceptable, as long as the prefix selection is deterministic.

Also, qname rewriting in XPath expressions that appear in XML text is not currently supported. I guess that's a bit of an esoteric feature which can still be added later if it's needed.

While testing, I noticed that ET and cET behave differently when it comes to resolving default attributes from an internal DTD subset. The parser in cET does it, ET does not. That should probably get aligned. For now, the tests hack around that difference.

Comments and reviews welcome.
msg340896 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-26 09:00
> Comments and reviews welcome.

Review of what? There is no PR attached to this issue.
msg340969 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-27 07:15
It took me a couple of minutes longer to submit it, but it's there now. :)

I'm aware that there is a lot of new code involved, covering really three new features, which makes reviewing it a non-trivial task. I personally think it's ready to go into the last alpha release on Monday to receive some initial visibility, but I would like to have at least a little feedback before that. Even just a general opinion whether you support pushing this into 3.8 or not. Postponing it to the first beta would be ok, if you need more time to form an opinion, but having it in an alpha would improve the chance of getting user feedback.
msg341045 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-04-29 06:56
Playing around with it a bit more, I ended up changing the interface of the canonicalize() function to return its output as a string by default. It's really nice to be able to say

    c14n_xml = canonicalize(plain_xml)

To write to a file, you now do this:

      with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
          canonicalize(xml_data, out=out_file)

and to read from a file:

      canonicalize(from_file=fileobj)

I think that nicely handles all use cases.
msg341205 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-01 15:00
> I personally think it's ready to go into the last alpha release

Since I didn't get any negative comments or requests for deferral, I'll merge this today to get the feature into the last (still unreleased) alpha. We still have the beta phase to resolve issues with it.
msg341218 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2019-05-01 17:37
The PR has reference leaks.
msg341221 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-01 18:15
Thanks for testing, Zackery. I resolved the reference leaks. They were already in the PR for issue 36676. Both PRs updated.
msg341233 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-01 20:34
New changeset e1d5dd645d5f59867cb0ad63179110f310cbca89 by Stefan Behnel in branch 'master':
bpo-13611: C14N 2.0 implementation for ElementTree (GH-12966)
https://github.com/python/cpython/commit/e1d5dd645d5f59867cb0ad63179110f310cbca89
msg341259 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-02 08:35
New changeset 0d5864fa07ab4f03188c690a5eb07bdd1fd1cb9c by Stefan Behnel in branch 'master':
bpo-13611: Include C14N 2.0 test data in installation (GH-13053)
https://github.com/python/cpython/commit/0d5864fa07ab4f03188c690a5eb07bdd1fd1cb9c
msg341260 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-02 08:41
A buildbot failure made me notice that the test files were not part of the CPython installation yet, so I added them. I also took the opportunity to add a README file that describes where they come from and under which conditions they were originally provided by the W3C (IANAL, but basically free use with copyright notice).

https://www.w3.org/Consortium/Legal/2015/doc-license

Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?
msg341264 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-02 13:05
> Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?

Maybe complete Doc/license.rst?
https://docs.python.org/dev/license.html
msg341315 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-05-02 20:12
> Maybe complete Doc/license.rst?

Thanks, done.
History
Date User Action Args
2019-05-02 20:12:44scodersetmessages: + msg341315
2019-05-02 14:43:07scodersetpull_requests: + pull_request12972
2019-05-02 13:05:13vstinnersetmessages: + msg341264
2019-05-02 08:41:11scodersetmessages: + msg341260
2019-05-02 08:35:15scodersetmessages: + msg341259
2019-05-02 07:51:20scodersetpull_requests: + pull_request12971
2019-05-01 20:37:18scodersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-05-01 20:34:20scodersetmessages: + msg341233
2019-05-01 18:15:01scodersetmessages: + msg341221
2019-05-01 17:37:37ZackerySpytzsetnosy: + ZackerySpytz
messages: + msg341218
2019-05-01 15:00:16scodersetmessages: + msg341205
2019-04-29 06:56:16scodersetmessages: + msg341045
2019-04-27 07:15:05scodersetmessages: + msg340969
2019-04-26 09:03:23scodersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request12893
2019-04-26 09:00:59vstinnersetnosy: + vstinner
messages: + msg340896
2019-04-26 09:00:04scodersetassignee: serhiy.storchaka -> scoder
messages: + msg340895
2019-03-25 12:14:08serhiy.storchakasetmessages: + msg338796
2018-11-16 18:44:25serhiy.storchakasetassignee: serhiy.storchaka
2018-11-16 14:31:17serhiy.storchakasetmessages: + msg330002
2018-11-16 14:19:44serhiy.storchakasetnosy: + serhiy.storchaka
stage: needs patch

versions: + Python 3.8, - Python 3.4
2014-09-26 11:38:51pitrousetnosy: + eli.bendersky
2014-09-26 11:09:41cbzsetnosy: + cbz
messages: + msg227606
2013-07-08 17:14:43christian.heimessetnosy: + effbot, christian.heimes

versions: + Python 3.4, - Python 3.3
2011-12-16 17:52:51loewissetnosy: + loewis
messages: + msg149633
2011-12-16 08:24:54pitrousetnosy: + flox
2011-12-16 08:17:40scodercreate