Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.etree.ElementInclude does not include nested xincludes #65127

Closed
JamesBailey mannequin opened this issue Mar 14, 2014 · 23 comments
Closed

xml.etree.ElementInclude does not include nested xincludes #65127

JamesBailey mannequin opened this issue Mar 14, 2014 · 23 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement

Comments

@JamesBailey
Copy link
Mannequin

JamesBailey mannequin commented Mar 14, 2014

BPO 20928
Nosy @rhettinger, @scoder, @bitdancer, @miss-islington, @ruffsl, @hauntsaninja
PRs
  • bpo-20928: support base-URL and recursive includes in etree.ElementInclude #5723
  • bpo-33187: Document 3.9 changes to xml.etree.ElementInclude.include #20438
  • [3.9] bpo-33187: Document 3.9 changes to xml.etree.ElementInclude.include (GH-20438) #20722
  • Files
  • issue20928.patch: Adds and tests recursive xinclude checks
  • issue20928_fixed.patch
  • issue20928_fixed2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-11-25.15:41:12.439>
    created_at = <Date 2014-03-14.20:40:06.508>
    labels = ['expert-XML', 'type-feature', 'library', '3.9']
    title = 'xml.etree.ElementInclude does not include nested xincludes'
    updated_at = <Date 2020-06-08.14:47:55.152>
    user = 'https://bugs.python.org/JamesBailey'

    bugs.python.org fields:

    activity = <Date 2020-06-08.14:47:55.152>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-11-25.15:41:12.439>
    closer = 'scoder'
    components = ['Library (Lib)', 'XML']
    creation = <Date 2014-03-14.20:40:06.508>
    creator = 'James.Bailey'
    dependencies = []
    files = ['34927', '35728', '35783']
    hgrepos = []
    issue_num = 20928
    keywords = ['patch']
    message_count = 23.0
    messages = ['213589', '213648', '216578', '220686', '220689', '220717', '221290', '221555', '221564', '221591', '312092', '312286', '312325', '312326', '312333', '312612', '312651', '314693', '314694', '357441', '357443', '370985', '370988']
    nosy_count = 10.0
    nosy_names = ['rhettinger', 'scoder', 'r.david.murray', 'eli.bendersky', 'James.Bailey', 'math_foo', 'urule99', 'miss-islington', 'ruffsl', 'hauntsaninja']
    pr_nums = ['5723', '20438', '20722']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue20928'
    versions = ['Python 3.9']

    @JamesBailey
    Copy link
    Mannequin Author

    JamesBailey mannequin commented Mar 14, 2014

    After xml.etree.ElementInclude.include inserts an Xinclude'd href it does not walk the just-inserted subtree to see if it contains any Xincludes itself.

    I think the behaviour should be modified to walk the included subtree and perform any Xincludes contained.

    @JamesBailey JamesBailey mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-XML labels Mar 14, 2014
    @scoder
    Copy link
    Contributor

    scoder commented Mar 15, 2014

    Agreed that this should be done. The XInclude spec suggests that the processing is transitive. Also, lxml does it that way, in both the libxml2 based implementation and the ElementInclude compatibility module.

    https://github.com/lxml/lxml/blob/master/src/lxml/ElementInclude.py

    (note that this was adapted for lxml, so it's not a drop-in replacement for ET anymore)

    Changing target version to Py3.5 as this is essentially a new (and somewhat backwards incompatible) feature.

    @mathfoo
    Copy link
    Mannequin

    mathfoo mannequin commented Apr 16, 2014

    xml.etree.ElementInclude.include now checks the inserted subtree to see if contains any Xincludes itself. And adds them.

    Also added a unit test to check for the new functionality (which will fail without the change to ElementInclude.py).

    @bitdancer
    Copy link
    Member

    Thanks Caelyn. This patch also needs a doc patch and a whatsnew entry in order to be complete. It's not obvious to me where the relevant documentation is, though, so perhaps we instead have missing documentation that should be addressed in a separate issue. The whatsnew entry would still be needed, though.

    However, I think there is something important missing here in the patch itself. It is specified (section 4.2.7 Inclusion Loops) that it is a fatal error for an already included document to be specified in a recursively processed xinclude directive, and indeed failing to detect such a loop has DOS implications. So, the patch needs to address that before it can be committed.

    @rhettinger
    Copy link
    Contributor

    FWIW, we often do whatsnew entries later in the development cycle. I think this can go forward without whatsnew.

    The doc entry could be as simple as a ..versionchanged 2.5 include() now supports recursive Xincludes or somesuch.

    @bitdancer
    Copy link
    Member

    Since there is no guarantee anyone is going to tackle the job of reviewing all the changes for whatsnew entries, I prefer that we get in the habit of creating the entries as we go along.

    As for the versionchanged...if you can find where include is documented, that would work :)

    The recursion still needs to be addressed (it is addressed in the lxml version Stefan linked to).

    @urule99
    Copy link
    Mannequin

    urule99 mannequin commented Jun 22, 2014

    This patch incorporates nested includes and a testcase that shows how it handles infinite include by throwing an exception.

    @scoder
    Copy link
    Contributor

    scoder commented Jun 25, 2014

    Your code adds a lot of new code. Why is that necessary? Can't the new feature be integrated into the existing code?

    @bitdancer
    Copy link
    Member

    Yeah, include should now be a very simple wrapper around a call to _include, include shouldn't have the original include code (which is moved to _include) in it.

    @urule99
    Copy link
    Mannequin

    urule99 mannequin commented Jun 26, 2014

    That would make sense. Please see if the updated patch file works.

    @ruffsl
    Copy link
    Mannequin

    ruffsl mannequin commented Feb 13, 2018

    I see it's been a few years, but I would still like to see xml.etree.ElementInclude support nested xincludes, as lxml does. Additionally, I'd like to have the default_loader attempt to open the href with respect to the original xml file's dirname, as again with lxml, as opposed to just assuming its relative to the working directory for the session.

    Let me know if there are any changes you'd like to see in the latest patch here, and I could incorporate those into a new PR?

    @scoder
    Copy link
    Contributor

    scoder commented Feb 17, 2018

    The latest "fixed2" patch looks good to me, but the author didn't sign a contributors agreement. However, I did, and I already wrote the same thing for lxml, so I put together an initial PR.

    @scoder scoder added 3.8 only security fixes type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Feb 17, 2018
    @ruffsl
    Copy link
    Mannequin

    ruffsl mannequin commented Feb 18, 2018

    Hello @scoder,

    Thanks for looking into this.

    In addition to what I mentioned with lxml's use of the parent xml file's respect dirname when attempting to open the include element's href, do you think it would be possible to (optionally or by default) have the include processor provide the necessary xml:base attributes for compliance with the specification [1][2][3].
    I know lxml partially does this, but not for recursive includes beyond a depth of one. Without this, it makes it impossible to track down the subsequent source for deep include paths, i.e. when you'd like to trace from what sequence of includes contributed the element in question.

    From looking at the "fixed2", although the added lines 137-143 in ElementInclude.py appear to prevent infinite recursive includes, but I think it also prevents valid merging of include trajectories when expanding the tree of include paths. E.g. what happens if I state the inclusion of the same source multiple times elsewhere in the same file, or elsewhere in the include tree that is no way a sub-child of the current parent file.
    Isn't already_included unnecessarily global in scope, when it just need to be relative to the trajectory in the sequence of current branches; only checking a node is not simultaneously a parent and child along a single include trajectory?

    [1] https://stackoverflow.com/a/22791471/2577586
    [2] https://www.w3.org/TR/xmlbase/
    [3] http://xerces.apache.org/xerces2-j/faq-xinclude.html#faq-3

    @JamesBailey
    Copy link
    Mannequin Author

    JamesBailey mannequin commented Feb 18, 2018

    Agreed with ruffsl's concerns about the overly aggressive detection of infinite recursion.

    I also wonder if the hrefs should be normalized or canonized for the check? The check may miss infinite recursions if the hrefs happen to be written in non-matching but equivalent forms. Ex: relative versus absolute paths.

    @scoder
    Copy link
    Contributor

    scoder commented Feb 19, 2018

    The check may miss infinite recursions if the hrefs happen to be written in non-matching but equivalent forms. Ex: relative versus absolute paths.

    I thought about that, too, but it's not a real problem. There are only a few different ways to spell the same file path, and once they are through, the recursion would still be detected and never become infinite.

    Admittedly, the current implementation might lower the overhead for attacks a little, but then, if an attacker can control the input anyway, then there is not really much to win by including the same file multiple times rather than including different files.

    Maybe we should add a "max_depth" parameter to limit the maximum recursion depth, defaulting to e.g. 5, that users would have to pass in order to say "I know what I'm doing".

    I agree with the comment about the overly restrictive global set, though. Included file paths should be collected only along an inclusion path and not across independent subtrees.

    @ruffsl
    Copy link
    Mannequin

    ruffsl mannequin commented Feb 23, 2018

    Included file paths should be collected only along an inclusion path and not across independent subtrees.

    Yes, well put.

    Maybe we should add a "max_depth" parameter to limit the maximum recursion depth, defaulting to e.g. 5, that users would have to pass in order to say "I know what I'm doing".

    Could that be set to false by the user, just in case we don't know beforehand how deep the rabbit hole goes, but we're feeling overly committed to see it through?

    Not to detract from the ticket, but I'd just like to share to a question related to this topic about the expected behavior of Xinclude [1]. You could also see it as a use case example for the recursive import feature we are currently deciding, of which would help avoid one more non system library to workaround [2].

    [1] https://stackoverflow.com/q/48857647/2577586
    [2] https://github.com/ComArmor/comarmor/blob/856dd339b090c28e86206d4d6af0fac050618e74/comarmor/__init__.py#L116

    @scoder
    Copy link
    Contributor

    scoder commented Feb 23, 2018

    I've added a 'max_depth' argument to limit the maximum recursive include depth to 6 by default (which is a small, arbitrarily chosen value). Users can set it to None to disable the limit.

    From my side, I don't see any more features that I would add.
    @ruffsl: feel free to open a new ticket if you feel like anything is missing that is not covered by the scope of this one.

    @scoder
    Copy link
    Contributor

    scoder commented Mar 30, 2018

    There seems to be no documentation currently for ElementInclude. Would be nice if someone who's interested in this feature could take the time to write something up. I created a documentation ticket as issue bpo-33187.

    @scoder
    Copy link
    Contributor

    scoder commented Mar 30, 2018

    PR is complete now, ready for merging.

    @scoder
    Copy link
    Contributor

    scoder commented Nov 25, 2019

    New changeset c6a7bdb by Stefan Behnel in branch 'master':
    bpo-20928: support base-URL and recursive includes in etree.ElementInclude (bpo-5723)
    c6a7bdb

    @scoder
    Copy link
    Contributor

    scoder commented Nov 25, 2019

    I think setting "xml:base" from ElementInclude is worth another ticket.

    @scoder scoder added 3.9 only security fixes and removed 3.8 only security fixes labels Nov 25, 2019
    @scoder scoder closed this as completed Nov 25, 2019
    @scoder
    Copy link
    Contributor

    scoder commented Jun 8, 2020

    New changeset 301f0d4 by Shantanu in branch 'master':
    bpo-33187: Document 3.9 changes to xml.etree.ElementInclude.include (GH-20438)
    301f0d4

    @miss-islington
    Copy link
    Contributor

    New changeset 1220a47 by Miss Islington (bot) in branch '3.9':
    bpo-33187: Document 3.9 changes to xml.etree.ElementInclude.include (GH-20438)
    1220a47

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants