msg149598 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2011-12-16 08:17 |
The ElementC14N.py module by Fredrik Lundh implements XML canonicalisation for the ElementTree serialiser. Given that the required API hooks to use it are already in xml.etree.ElementTree, this would make a nice, simple and straight forward addition to the existing xml.etree package.
The source can be found here (unchanged since at least 2009):
https://bitbucket.org/effbot/et-2009-provolone/src/tip/elementtree/elementtree/ElementC14N.py
Note that the source needs some minor modifications to use relative imports at the top. Also, the "2.3 compatibility" code section can be dropped.
|
msg149633 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-12-16 17:52 |
Code added to the standard library should be contributed by its author, with an explicit statement of plans to support it in an ongoing manner, and preferably also with plans to stop providing standalone releases over time.
|
msg227606 - (view) |
Author: Chris E (cbz) |
Date: 2014-09-26 11:09 |
Whilst in most cases this would be correct, in this case it looks like the original contributor took a subset of what the original author wrote and put it into the python libraries.
Until relatively recently the ElementTree.py file included a stanza that attempted to import the ElementC14N module and conditionally set up the 'c14n' key value in _serialize
|
msg330002 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-11-16 14:31 |
"c14n" is documented as an accepted serialization method of write() and there is some (non-working) code for support of C14N. e6a951b83e30b3b9c809a441041fb0f21f72b168 removed optional import of ElementC14N.
|
msg338796 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-03-25 12:14 |
References:
Canonical XML Version 2.0 -- https://www.w3.org/TR/xml-c14n2/
Test cases for Canonical XML 2.0 -- https://www.w3.org/TR/xml-c14n2-testcases/
|
msg340895 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-04-26 09:00 |
Turns out, it was not that easy. :-/
ElementTree lacks prefixes in its tree model, so they would have to be either registered globally (via register_namespace()) or come from the parser. I tried the latter since that is the most generic way when the input is serialised already. See issue 36673 and issue 36676 for extensions to the parser target interface that this implementation relies on. Note that this is a new implementation, only marginally based off the original ElementC14N implementation.
I only implemented C14N 2.0 (which lxml also does not have, but I'll add it there). I got most of the official test cases working, including prefix rewriting and prefix resolution in tag and attribute content.
https://www.w3.org/TR/xml-c14n2-testcases/
What's not supported?
The original namespace prefixes may not be preserved when namespaces are declared with multiple prefixes. In that case, one of them is picked. That's difficult to implement in ET because the parser resolves and discards prefixes. I think that's acceptable, as long as the prefix selection is deterministic.
Also, qname rewriting in XPath expressions that appear in XML text is not currently supported. I guess that's a bit of an esoteric feature which can still be added later if it's needed.
While testing, I noticed that ET and cET behave differently when it comes to resolving default attributes from an internal DTD subset. The parser in cET does it, ET does not. That should probably get aligned. For now, the tests hack around that difference.
Comments and reviews welcome.
|
msg340896 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-04-26 09:00 |
> Comments and reviews welcome.
Review of what? There is no PR attached to this issue.
|
msg340969 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-04-27 07:15 |
It took me a couple of minutes longer to submit it, but it's there now. :)
I'm aware that there is a lot of new code involved, covering really three new features, which makes reviewing it a non-trivial task. I personally think it's ready to go into the last alpha release on Monday to receive some initial visibility, but I would like to have at least a little feedback before that. Even just a general opinion whether you support pushing this into 3.8 or not. Postponing it to the first beta would be ok, if you need more time to form an opinion, but having it in an alpha would improve the chance of getting user feedback.
|
msg341045 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-04-29 06:56 |
Playing around with it a bit more, I ended up changing the interface of the canonicalize() function to return its output as a string by default. It's really nice to be able to say
c14n_xml = canonicalize(plain_xml)
To write to a file, you now do this:
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(xml_data, out=out_file)
and to read from a file:
canonicalize(from_file=fileobj)
I think that nicely handles all use cases.
|
msg341205 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-01 15:00 |
> I personally think it's ready to go into the last alpha release
Since I didn't get any negative comments or requests for deferral, I'll merge this today to get the feature into the last (still unreleased) alpha. We still have the beta phase to resolve issues with it.
|
msg341218 - (view) |
Author: Zackery Spytz (ZackerySpytz) * |
Date: 2019-05-01 17:37 |
The PR has reference leaks.
|
msg341221 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-01 18:15 |
Thanks for testing, Zackery. I resolved the reference leaks. They were already in the PR for issue 36676. Both PRs updated.
|
msg341233 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-01 20:34 |
New changeset e1d5dd645d5f59867cb0ad63179110f310cbca89 by Stefan Behnel in branch 'master':
bpo-13611: C14N 2.0 implementation for ElementTree (GH-12966)
https://github.com/python/cpython/commit/e1d5dd645d5f59867cb0ad63179110f310cbca89
|
msg341259 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-02 08:35 |
New changeset 0d5864fa07ab4f03188c690a5eb07bdd1fd1cb9c by Stefan Behnel in branch 'master':
bpo-13611: Include C14N 2.0 test data in installation (GH-13053)
https://github.com/python/cpython/commit/0d5864fa07ab4f03188c690a5eb07bdd1fd1cb9c
|
msg341260 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-02 08:41 |
A buildbot failure made me notice that the test files were not part of the CPython installation yet, so I added them. I also took the opportunity to add a README file that describes where they come from and under which conditions they were originally provided by the W3C (IANAL, but basically free use with copyright notice).
https://www.w3.org/Consortium/Legal/2015/doc-license
Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?
|
msg341264 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2019-05-02 13:05 |
> Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?
Maybe complete Doc/license.rst?
https://docs.python.org/dev/license.html
|
msg341315 - (view) |
Author: Stefan Behnel (scoder) * |
Date: 2019-05-02 20:12 |
> Maybe complete Doc/license.rst?
Thanks, done.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:24 | admin | set | github: 57820 |
2019-05-02 20:12:44 | scoder | set | messages:
+ msg341315 |
2019-05-02 14:43:07 | scoder | set | pull_requests:
+ pull_request12972 |
2019-05-02 13:05:13 | vstinner | set | messages:
+ msg341264 |
2019-05-02 08:41:11 | scoder | set | messages:
+ msg341260 |
2019-05-02 08:35:15 | scoder | set | messages:
+ msg341259 |
2019-05-02 07:51:20 | scoder | set | pull_requests:
+ pull_request12971 |
2019-05-01 20:37:18 | scoder | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2019-05-01 20:34:20 | scoder | set | messages:
+ msg341233 |
2019-05-01 18:15:01 | scoder | set | messages:
+ msg341221 |
2019-05-01 17:37:37 | ZackerySpytz | set | nosy:
+ ZackerySpytz messages:
+ msg341218
|
2019-05-01 15:00:16 | scoder | set | messages:
+ msg341205 |
2019-04-29 06:56:16 | scoder | set | messages:
+ msg341045 |
2019-04-27 07:15:05 | scoder | set | messages:
+ msg340969 |
2019-04-26 09:03:23 | scoder | set | keywords:
+ patch stage: needs patch -> patch review pull_requests:
+ pull_request12893 |
2019-04-26 09:00:59 | vstinner | set | nosy:
+ vstinner messages:
+ msg340896
|
2019-04-26 09:00:04 | scoder | set | assignee: serhiy.storchaka -> scoder messages:
+ msg340895 |
2019-03-25 12:14:08 | serhiy.storchaka | set | messages:
+ msg338796 |
2018-11-16 18:44:25 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2018-11-16 14:31:17 | serhiy.storchaka | set | messages:
+ msg330002 |
2018-11-16 14:19:44 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: needs patch
versions:
+ Python 3.8, - Python 3.4 |
2014-09-26 11:38:51 | pitrou | set | nosy:
+ eli.bendersky
|
2014-09-26 11:09:41 | cbz | set | nosy:
+ cbz messages:
+ msg227606
|
2013-07-08 17:14:43 | christian.heimes | set | nosy:
+ effbot, christian.heimes
versions:
+ Python 3.4, - Python 3.3 |
2011-12-16 17:52:51 | loewis | set | nosy:
+ loewis messages:
+ msg149633
|
2011-12-16 08:24:54 | pitrou | set | nosy:
+ flox
|
2011-12-16 08:17:40 | scoder | create | |