Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate ElementC14N module into xml.etree package #57820

Closed
scoder opened this issue Dec 16, 2011 · 17 comments
Closed

Integrate ElementC14N module into xml.etree package #57820

scoder opened this issue Dec 16, 2011 · 17 comments
Assignees
Labels
3.8 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement

Comments

@scoder
Copy link
Contributor

scoder commented Dec 16, 2011

BPO 13611
Nosy @loewis, @scoder, @vstinner, @tiran, @florentx, @serhiy-storchaka, @ZackerySpytz
PRs
  • bpo-13611: C14N 2.0 implementation for ElementTree #12966
  • bpo-13611: Include C14N 2.0 test data in installation #13053
  • bpo-13611: Add correct license for C14N test suite to license docs. #13055
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/scoder'
    closed_at = <Date 2019-05-01.20:37:18.196>
    created_at = <Date 2011-12-16.08:17:40.460>
    labels = ['expert-XML', '3.8', 'type-feature', 'library']
    title = 'Integrate ElementC14N module into xml.etree package'
    updated_at = <Date 2019-05-02.20:12:44.941>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2019-05-02.20:12:44.941>
    actor = 'scoder'
    assignee = 'scoder'
    closed = True
    closed_date = <Date 2019-05-01.20:37:18.196>
    closer = 'scoder'
    components = ['Library (Lib)', 'XML']
    creation = <Date 2011-12-16.08:17:40.460>
    creator = 'scoder'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 13611
    keywords = ['patch']
    message_count = 17.0
    messages = ['149598', '149633', '227606', '330002', '338796', '340895', '340896', '340969', '341045', '341205', '341218', '341221', '341233', '341259', '341260', '341264', '341315']
    nosy_count = 10.0
    nosy_names = ['loewis', 'effbot', 'scoder', 'vstinner', 'christian.heimes', 'eli.bendersky', 'flox', 'serhiy.storchaka', 'cbz', 'ZackerySpytz']
    pr_nums = ['12966', '13053', '13055']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13611'
    versions = ['Python 3.8']

    @scoder
    Copy link
    Contributor Author

    scoder commented Dec 16, 2011

    The ElementC14N.py module by Fredrik Lundh implements XML canonicalisation for the ElementTree serialiser. Given that the required API hooks to use it are already in xml.etree.ElementTree, this would make a nice, simple and straight forward addition to the existing xml.etree package.

    The source can be found here (unchanged since at least 2009):

    https://bitbucket.org/effbot/et-2009-provolone/src/tip/elementtree/elementtree/ElementC14N.py

    Note that the source needs some minor modifications to use relative imports at the top. Also, the "2.3 compatibility" code section can be dropped.

    @scoder scoder added stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement labels Dec 16, 2011
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 16, 2011

    Code added to the standard library should be contributed by its author, with an explicit statement of plans to support it in an ongoing manner, and preferably also with plans to stop providing standalone releases over time.

    @cbz
    Copy link
    Mannequin

    cbz mannequin commented Sep 26, 2014

    Whilst in most cases this would be correct, in this case it looks like the original contributor took a subset of what the original author wrote and put it into the python libraries.

    Until relatively recently the ElementTree.py file included a stanza that attempted to import the ElementC14N module and conditionally set up the 'c14n' key value in _serialize

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Nov 16, 2018
    @serhiy-storchaka
    Copy link
    Member

    "c14n" is documented as an accepted serialization method of write() and there is some (non-working) code for support of C14N. e6a951b removed optional import of ElementC14N.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 16, 2018
    @serhiy-storchaka
    Copy link
    Member

    References:

    Canonical XML Version 2.0 -- https://www.w3.org/TR/xml-c14n2/
    Test cases for Canonical XML 2.0 -- https://www.w3.org/TR/xml-c14n2-testcases/

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 26, 2019

    Turns out, it was not that easy. :-/

    ElementTree lacks prefixes in its tree model, so they would have to be either registered globally (via register_namespace()) or come from the parser. I tried the latter since that is the most generic way when the input is serialised already. See bpo-36673 and bpo-36676 for extensions to the parser target interface that this implementation relies on. Note that this is a new implementation, only marginally based off the original ElementC14N implementation.

    I only implemented C14N 2.0 (which lxml also does not have, but I'll add it there). I got most of the official test cases working, including prefix rewriting and prefix resolution in tag and attribute content.

    https://www.w3.org/TR/xml-c14n2-testcases/

    What's not supported?

    The original namespace prefixes may not be preserved when namespaces are declared with multiple prefixes. In that case, one of them is picked. That's difficult to implement in ET because the parser resolves and discards prefixes. I think that's acceptable, as long as the prefix selection is deterministic.

    Also, qname rewriting in XPath expressions that appear in XML text is not currently supported. I guess that's a bit of an esoteric feature which can still be added later if it's needed.

    While testing, I noticed that ET and cET behave differently when it comes to resolving default attributes from an internal DTD subset. The parser in cET does it, ET does not. That should probably get aligned. For now, the tests hack around that difference.

    Comments and reviews welcome.

    @scoder scoder assigned scoder and unassigned serhiy-storchaka Apr 26, 2019
    @vstinner
    Copy link
    Member

    Comments and reviews welcome.

    Review of what? There is no PR attached to this issue.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 27, 2019

    It took me a couple of minutes longer to submit it, but it's there now. :)

    I'm aware that there is a lot of new code involved, covering really three new features, which makes reviewing it a non-trivial task. I personally think it's ready to go into the last alpha release on Monday to receive some initial visibility, but I would like to have at least a little feedback before that. Even just a general opinion whether you support pushing this into 3.8 or not. Postponing it to the first beta would be ok, if you need more time to form an opinion, but having it in an alpha would improve the chance of getting user feedback.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 29, 2019

    Playing around with it a bit more, I ended up changing the interface of the canonicalize() function to return its output as a string by default. It's really nice to be able to say

        c14n_xml = canonicalize(plain_xml)

    To write to a file, you now do this:

          with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
              canonicalize(xml_data, out=out_file)

    and to read from a file:

          canonicalize(from_file=fileobj)

    I think that nicely handles all use cases.

    @scoder
    Copy link
    Contributor Author

    scoder commented May 1, 2019

    I personally think it's ready to go into the last alpha release

    Since I didn't get any negative comments or requests for deferral, I'll merge this today to get the feature into the last (still unreleased) alpha. We still have the beta phase to resolve issues with it.

    @ZackerySpytz
    Copy link
    Mannequin

    ZackerySpytz mannequin commented May 1, 2019

    The PR has reference leaks.

    @scoder
    Copy link
    Contributor Author

    scoder commented May 1, 2019

    Thanks for testing, Zackery. I resolved the reference leaks. They were already in the PR for bpo-36676. Both PRs updated.

    @scoder
    Copy link
    Contributor Author

    scoder commented May 1, 2019

    New changeset e1d5dd6 by Stefan Behnel in branch 'master':
    bpo-13611: C14N 2.0 implementation for ElementTree (GH-12966)
    e1d5dd6

    @scoder scoder closed this as completed May 1, 2019
    @scoder
    Copy link
    Contributor Author

    scoder commented May 2, 2019

    New changeset 0d5864f by Stefan Behnel in branch 'master':
    bpo-13611: Include C14N 2.0 test data in installation (GH-13053)
    0d5864f

    @scoder
    Copy link
    Contributor Author

    scoder commented May 2, 2019

    A buildbot failure made me notice that the test files were not part of the CPython installation yet, so I added them. I also took the opportunity to add a README file that describes where they come from and under which conditions they were originally provided by the W3C (IANAL, but basically free use with copyright notice).

    https://www.w3.org/Consortium/Legal/2015/doc-license

    Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?

    @vstinner
    Copy link
    Member

    vstinner commented May 2, 2019

    Is there anything else I have to take care of when adding externally provided/licensed files to the source tree?

    Maybe complete Doc/license.rst?
    https://docs.python.org/dev/license.html

    @scoder
    Copy link
    Contributor Author

    scoder commented May 2, 2019

    Maybe complete Doc/license.rst?

    Thanks, done.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants