Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML vulnerabilities in Python #61441

Open
tiran opened this issue Feb 19, 2013 · 23 comments
Open

XML vulnerabilities in Python #61441

tiran opened this issue Feb 19, 2013 · 23 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-XML type-security A security issue

Comments

@tiran
Copy link
Member

tiran commented Feb 19, 2013

BPO 17239
Nosy @warsaw, @birkenfeld, @rhettinger, @pitrou, @scoder, @larryhastings, @tiran, @benjaminp, @jwilk, @ned-deily, @mcepl, @ezio-melotti, @mitar, @vadmium, @serhiy-storchaka, @zooba
PRs
  • bpo-17239: Disable external entities in SAX parser #9217
  • gh-61441: XML entity expansion limitation #9265
  • [3.7] bpo-17239: Disable external entities in SAX parser (GH-9217) #9511
  • [3.6] bpo-17239: Disable external entities in SAX parser (GH-9217) #9512
  • Dependencies
  • bpo-17318: xml.sax and xml.dom fetch DTDs by default
  • bpo-24238: Avoid entity expansion attacks in Element Tree
  • Files
  • xmlbomb_20130219.patch
  • xmlbomb_20150518.patch: Merged to 3.5
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2013-02-19.15:35:41.914>
    labels = ['type-security', 'expert-XML', '3.8', '3.9', 'extension-modules', '3.7', 'library']
    title = 'XML vulnerabilities in Python'
    updated_at = <Date 2021-11-08.16:56:41.595>
    user = 'https://github.com/tiran'

    bugs.python.org fields:

    activity = <Date 2021-11-08.16:56:41.595>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules', 'Library (Lib)', 'XML']
    creation = <Date 2013-02-19.15:35:41.914>
    creator = 'christian.heimes'
    dependencies = ['17318', '24238']
    files = ['29122', '39415']
    hgrepos = []
    issue_num = 17239
    keywords = ['patch']
    message_count = 23.0
    messages = ['182393', '184285', '184289', '184387', '185053', '243450', '243469', '243581', '324416', '324685', '325562', '325573', '325586', '325590', '325595', '325610', '325642', '325648', '325702', '325738', '326144', '326228', '326229']
    nosy_count = 20.0
    nosy_names = ['barry', 'georg.brandl', 'rhettinger', 'pitrou', 'scoder', 'larry', 'christian.heimes', 'benjamin.peterson', 'jwilk', 'ned.deily', 'mcepl', 'ezio.melotti', 'Arfrever', 'eli.bendersky', 'mitar', 'martin.panter', 'serhiy.storchaka', 'franck', 'steve.dower', 'rsandwick3']
    pr_nums = ['9217', '9265', '9511', '9512']
    priority = 'critical'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue17239'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @tiran
    Copy link
    Member Author

    tiran commented Feb 19, 2013

    Experimental fix for XML vulnerabilities against default. It's NOT ready and needs lots of polishing.

    https://pypi.python.org/pypi/defusedxml contains explanations of all issues
    https://pypi.python.org/pypi/defusedexpat is a standalone version of part of the patches for Python 2.6 to 3.3

    @tiran tiran added release-blocker extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-XML type-security A security issue labels Feb 19, 2013
    @benjaminp
    Copy link
    Contributor

    Since this has dragged on for quite a while, I'm probably just going to release 2.7.4 with a pointer to defusedxml in the release notes. (docs, though, perhaps)

    @rhettinger
    Copy link
    Contributor

    Since this has dragged on for quite a while, I'm probably
    just going to release 2.7.4 with a pointer to defusedxml
    in the release notes. (docs, though, perhaps)

    +1

    @pitrou
    Copy link
    Member

    pitrou commented Mar 17, 2013

    Since this has dragged on for quite a while, I'm probably just going to
    release 2.7.4 with a pointer to defusedxml in the release notes. (docs,
    though, perhaps)

    +1 too.

    @benjaminp
    Copy link
    Contributor

    Not blocking 2.7.4 as discussed on mailing list.

    @vadmium
    Copy link
    Member

    vadmium commented May 18, 2015

    I did a rough merge with current “default” (3.5 pre-release) branch so that I can have a closer look at this issue; see xmlbomb_20150518.patch for the result. There are some bits with Argument Clinit that need perfecting:

    • Unsure how to convert the ElementTree.XMLParser.__init__() signature (varied depending on XML_BOMB_PROTECTION compile-time flag) to Argument Clinic. So I just hard-coded it as if XML_BOMB_PROTECTION is always enabled. Why do we have to have a variable signature in the first place?

    • New pyexpat functions need porting to Argument Clinic.

    @vadmium
    Copy link
    Member

    vadmium commented May 18, 2015

    I started looking at the lower Expat-level changes. Here are some thoughts, in the order that I thought them. :) But the end result is to investigate a different approach to disable entities in existing versions of Expat.

    Currently, it looks like max_entity_indirections = 0 is a special value meaning no limit. I think it would be better to use some other value such as None for this, and then 0 could disable all entity expansion (other than pre-defined entities like & &#xNNNN; etc).

    What is the benefit of having the indirection limit? I would have thought the entity expansion (character) limit on its own would already be effective at preventing nested expansion attacks like “billion laughs”. Even if the entity expanded to an empty string, all of the intermediate entity references are still included in the character count.

    I wonder if it would make more sense to have a total character limit instead, which would include the characters from custom entity expansions as already counted by the patch, but also count characters directly from the XML body. Why would you want to avoid 8 million characters from entity expansion, but allow 8 million characters of plain XML (or gzipped XML)? (I am not an XML expert, so I could be missing something obvious here.)

    Now I have discovered that it seems you can build Python to use an external Expat library, which won’t be affected by Christian’s fix (correct me if I am wrong). I think we should find a different solution that will also work with existing external Expat versions. Maybe setting EntityDeclHandler to raise an error would be good enough:

    >>> from xml.parsers import expat
    >>> bomb = '<!DOCTYPE bomb [\n<!ENTITY a "" >\n<!ENTITY b "' + '&a;' * 1000 + '" >\n<!ENTITY c "' + '&b;' * 1000 + '" >\n]>\n<bomb a="' + '&c;' * 10 + '" />\n'
    >>> p = expat.ParserCreate()
    >>> p.Parse(bomb, True)  # Noticeable delay (DOS) while parsing
    1
    >>> p = expat.ParserCreate()
    >>> def handler(*so_much_argh):
    ...     raise ValueError("Entity handling disabled")
    ... 
    >>> p.EntityDeclHandler = handler
    >>> p.Parse(bomb, True)  # Instant failure (no DOS)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/build/python/src/Python-3.4.3/Modules/pyexpat.c", line 494, in EntityDecl
      File "<stdin>", line 2, in handler
    ValueError: Entity handling disabled

    This solution has been suggested and implemented elsewhere:

    @vadmium
    Copy link
    Member

    vadmium commented May 19, 2015

    I have opened bpo-24238 with a patch for Element Tree that uses my EntityDeclHandler technique, instead of patching Expat. I would be interested in other people’s thoughts on the approach.

    @vstinner
    Copy link
    Member

    This issue didn't get much attention in 5 years. The XML documentation starts with a big red warning:
    https://docs.python.org/dev/library/xml.html

    The warning is present in 2.7 and 3.4 as well:
    https://docs.python.org/2.7/library/xml.html
    https://docs.python.org/3.4/library/xml.html

    It seems like XML is getting less popular because of JSON becoming more popular (JSON obviously comes with its own set of security issues). It seems like less core developers care about XML.

    I suggest to:

    We just have to accept that core developers have limited availability and that documenting security issues is an acceptable tradeoff. I don't see any value of keeping these 3 issues open.

    @mcepl
    Copy link
    Mannequin

    mcepl mannequin commented Sep 6, 2018

    I suggest to:

    • close bpo-17318 as a duplicate of this issue (bpo-17239)
    • close bpo-24238
    • close this issue

    +1 from me.

    @zooba
    Copy link
    Member

    zooba commented Sep 17, 2018

    Ned - I don't think this is necessarily a release blocker, as we've been shipping it for a long time, but it would be nice if we can hold 3.7.1rc1 just long enough to get it in (provided Christian jumps in and says he'll get the last minor concerns on the PRs wrapped up very soon)

    @zooba zooba added 3.7 (EOL) end of life 3.8 only security fixes labels Sep 17, 2018
    @ned-deily
    Copy link
    Member

    We discussed this last week at the sprint. Christian, it would be great if you could get this merged for 3.7 and possibly 3.6 in the next 24 hours.

    @tiran
    Copy link
    Member Author

    tiran commented Sep 17, 2018

    The external entity patch is ready, but the billion laughs fix need more time. I'm working with an upstream developer on a proper fix.

    @zooba
    Copy link
    Member

    zooba commented Sep 17, 2018

    Any reason to not take the current patch for our vendored copy and give it some exposure at least on platforms that rely on it (maybe just Windows)? I don't see any reason to wait on another group to "release" it when we need to manually apply the update to our own repo anyway.

    Platforms using system libexpat that hasn't been patched have obviously decided not to patch it themselves :)

    @vstinner
    Copy link
    Member

    Any reason to not take the current patch for our vendored copy and give it some exposure at least on platforms that rely on it (maybe just Windows)? I don't see any reason to wait on another group to "release" it when we need to manually apply the update to our own repo anyway.

    My policy is upstream fix: first, get a change merged upstream.

    If we start with a downstream patch:

    • only Windows and macOS will get the fix
    • upstream may require changes making the change incompatible, for example change the default limits
    • I would prefer to keep Modules/expat/ as close as possible to the upstream

    Python is vulnerable for years, it's not like there is an urgency to fix it.

    @zooba
    Copy link
    Member

    zooba commented Sep 18, 2018

    There's also the view that it'll be easier to justify upstreaming a patch if it's been released and tested in a separate app. We require that all the time for Python patches, so why should we expect other projects to be different?

    We're totally entitled to only release it for those platforms, because we are responsible for libexpat on those (we could vendor it for all of them? Or switch to platform-supported libraries for macOS and Windows?)

    Who normally updates the vendored libexpat? I'd rather let them make the call on how far to diverge from upstream, since it'll be up to them to roll the changes forward or revert them in favour of upstream. I doubt different defaults will be an issue, especially since they aren't configurable anyway.

    @vstinner
    Copy link
    Member

    Who normally updates the vendored libexpat?

    I made the 3 latest libexpat updates, and each of them was painful :-)

    My notes on vendored libraries:
    https://pythondev.readthedocs.io/cpython.html#vendored-external-libraries

    I wrote a tool to get the version of all vendored libraries, and a script to updated libexpat.

    @tiran
    Copy link
    Member Author

    tiran commented Sep 18, 2018

    • only Windows and macOS will get the fix

    Modules/expat can be used on all platforms. A downstream patch is only a problem for platforms that compile Python with "./configure --with-system-expat".

    The security fixes for entity expansion blowup and external entity loading are backwards incompatible fixes. Technically they also violate XML standards. In practice the vast majority of users will never run into the issue, because external entities are scarcely used. The expat parser is a non-validating XML parser, so DTDs aren't useful at all. I'd rather break a handful of users than to keep the majority of users vulnerable.

    To fix billion laughs and quadratic blowup once and for all, we also have to break backwards compatibility and require expat >= 2.3.0. For now the modules still work with old versions of expat. IMO it's fine. Vendors either have to update their libraries or use our copy of expat.

    Ultimately it's Benjamin's, Larry's, and Ned's decision. They are release managers.

    @benjaminp
    Copy link
    Contributor

    On Tue, Sep 18, 2018, at 06:39, STINNER Victor wrote:

    STINNER Victor <vstinner@redhat.com> added the comment:

    > Who normally updates the vendored libexpat?

    I made the 3 latest libexpat updates, and each of them was painful :-)

    Oh? I've updated it twice (4e21100 and 5033aa7), and it didn't seem so bad. I just copied the upstream files in. Did I do it wrong?

    @vstinner
    Copy link
    Member

    Oh? I've updated it twice (4e21100 and 5033aa7), and it didn't seem so bad. I just copied the upstream files in. Did I do it wrong?

    Let me remind what I did...

    bpo-30694 (expat 2.2.1):

    • I wrote a script to rebuild Modules/expat/ from the upstream code
    • I had to manually keep our old pyexpatns.h file since it's a downstream change
    • Then you have to add againt #include "pyexpatns.h" in Modules/expat/expat_external.h
    • It broke buildbots: bpo-29591
    • The change introduced a compilation warning: bpo-30797

    bpo-30947 (expat 2.2.3):

    There are different issues:

    • We have some small downstream changes
    • We still support VS 2008 for Python 2.7 whereas upstream doesn't care of this old legacy compiler
    • Each release introduces its own set of bugs :-D
    • Each release comes with its own set of new warnings...

    At least for me, each update was painful. It's also painful to have to make the same change in all supported branches (2.7, 3.4, 3.5, 3.6, 3.7, master).

    @miss-islington
    Copy link
    Contributor

    New changeset 17b1d5d by Miss Islington (bot) (Christian Heimes) in branch 'master':
    bpo-17239: Disable external entities in SAX parser (GH-9217)
    17b1d5d

    @miss-islington
    Copy link
    Contributor

    New changeset 582d188 by Miss Islington (bot) (Christian Heimes) in branch '3.6':
    [3.6] bpo-17239: Disable external entities in SAX parser (GH-9217) (GH-9512)
    582d188

    @miss-islington
    Copy link
    Contributor

    New changeset 394e55a by Miss Islington (bot) (Christian Heimes) in branch '3.7':
    [3.7] bpo-17239: Disable external entities in SAX parser (GH-9217) (GH-9511)
    394e55a

    @csabella csabella added the 3.9 only security fixes label Feb 4, 2020
    @ahmedsayeed1982 ahmedsayeed1982 mannequin added extension-modules C modules in the Modules dir 3.8 only security fixes and removed extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir 3.7 (EOL) end of life 3.8 only security fixes topic-XML 3.9 only security fixes labels Nov 4, 2021
    @eryksun eryksun added stdlib Python modules in the Lib dir topic-XML 3.7 (EOL) end of life 3.9 only security fixes labels Nov 4, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-XML type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests