Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various refleaks in _elementtree, and crashes when using an uninitialized XMLParser object #75939

Closed
orenmn mannequin opened this issue Oct 11, 2017 · 9 comments
Closed
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-XML type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@orenmn
Copy link
Mannequin

orenmn mannequin commented Oct 11, 2017

BPO 31758
Nosy @scoder, @serhiy-storchaka, @orenmn, @miss-islington
PRs
  • bpo-31758: Prevent reference leaks in __setstate__() and __init__() of _elementtree.Element #3956
  • bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object #3997
  • [3.8] bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997) #19485
  • [3.7] bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997) #19486
  • [3.7] bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997) #19487
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-12.17:16:54.394>
    created_at = <Date 2017-10-11.14:53:39.948>
    labels = ['expert-XML', '3.7', '3.8', '3.9', 'type-crash']
    title = 'various refleaks in _elementtree, and crashes when using an uninitialized XMLParser object'
    updated_at = <Date 2020-04-12.17:16:54.393>
    user = 'https://github.com/orenmn'

    bugs.python.org fields:

    activity = <Date 2020-04-12.17:16:54.393>
    actor = 'scoder'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-12.17:16:54.394>
    closer = 'scoder'
    components = ['XML']
    creation = <Date 2017-10-11.14:53:39.948>
    creator = 'Oren Milman'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31758
    keywords = ['patch']
    message_count = 9.0
    messages = ['304145', '304322', '304396', '366248', '366252', '366253', '366254', '366255', '366256']
    nosy_count = 5.0
    nosy_names = ['scoder', 'eli.bendersky', 'serhiy.storchaka', 'Oren Milman', 'miss-islington']
    pr_nums = ['3956', '3997', '19485', '19486', '19487']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue31758'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @orenmn
    Copy link
    Mannequin Author

    orenmn mannequin commented Oct 11, 2017

    The following code results in refleaks:
    import sys
    import _elementtree
    builder = _elementtree.TreeBuilder()
    parser = _elementtree.XMLParser(target=builder)

    refcount_before = sys.gettotalrefcount()
    parser.__init__(target=builder)
    print(sys.gettotalrefcount() - refcount_before)  # should be close to 0

    This is because _elementtree_XMLParser___init___impl()
    (in Modules/_elementtree.c) doesn't decref before assigning to fields of
    self.

    The following code also results in refleaks:
    import sys
    import _elementtree
    elem = _elementtree.Element(42)
    elem.__setstate__({'tag': 42, '_children': list(range(1000))})

    refcount_before = sys.gettotalrefcount()
    elem.__setstate__({'tag': 42, '_children': []})
    print(sys.gettotalrefcount() - refcount_before)  # should be close to -1000

    This is because element_setstate_from_attributes() doesn't decref the old
    children before storing the new children.

    I would open a PR to fix this soon.

    @orenmn orenmn mannequin added performance Performance or resource usage 3.7 (EOL) end of life topic-XML labels Oct 11, 2017
    @orenmn
    Copy link
    Mannequin Author

    orenmn mannequin commented Oct 13, 2017

    Shame on me. I only now found out that Serhiy already mentioned most of the refleaks
    in https://bugs.python.org/issue31455#msg302103.

    @orenmn
    Copy link
    Mannequin Author

    orenmn mannequin commented Oct 14, 2017

    According to Serhiy's advice (https://bugs.python.org/issue31455#msg304338),
    this issue now also includes some crashes in _elementtree:

    The following code crashes:
    import _elementtree
    parser = _elementtree.XMLParser.__new__(_elementtree.XMLParser)
    parser.close()

    This is because _elementtree_XMLParser_close_impl() assumes that the XMLParser
    object is initialized, and so it passes self to expat_parse(), which assumes
    that self->parser is valid, and crashes.
    Similarly, calling feed(), _parse_whole() or _setevents(), or reading the
    entity or target attribute of an uninitialized XMLParser object would
    result in a crash.

    ISTM that PR 3956 is more complex, and already not so small, so i would soon
    open another PR to fix these crashes.

    @orenmn orenmn mannequin changed the title various refleaks in _elementtree various refleaks in _elementtree, and crashes when using an uninitialized XMLParser object Oct 14, 2017
    @orenmn orenmn mannequin added type-crash A hard crash of the interpreter, possibly with a core dump and removed performance Performance or resource usage labels Oct 14, 2017
    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    This has been pending for a while too long, but the fixes look good to me. They should still go at least into Py3.8 and later.

    @scoder scoder added 3.8 only security fixes 3.9 only security fixes and removed 3.7 (EOL) end of life labels Apr 12, 2020
    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    New changeset 402e1cd by Oren Milman in branch 'master':
    bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997)
    402e1cd

    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    New changeset 6151148 by Miss Islington (bot) in branch '3.8':
    bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997) (GH-19485)
    6151148

    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    Let's add it to the last bug fix release of 3.7 as well. It fixes a crash bug, after all.

    @scoder scoder added the 3.7 (EOL) end of life label Apr 12, 2020
    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    New changeset 096e41a by Miss Islington (bot) in branch '3.7':
    [3.7] bpo-31758: Prevent crashes when using an uninitialized _elementtree.XMLParser object (GH-3997) (GH-19487)
    096e41a

    @scoder
    Copy link
    Contributor

    scoder commented Apr 12, 2020

    Thanks for the fix, Oren!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-XML type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant