This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: XML vulnerabilities in Python
Type: security Stage: patch review
Components: Extension Modules, Library (Lib), XML Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: 17318 24238 Superseder:
Assigned To: Nosy List: Arfrever, barry, benjamin.peterson, christian.heimes, eli.bendersky, ezio.melotti, franck, georg.brandl, jwilk, larry, martin.panter, mcepl, mitar, ned.deily, pitrou, rhettinger, rsandwick3, scoder, serhiy.storchaka, steve.dower
Priority: critical Keywords: patch

Created on 2013-02-19 15:35 by christian.heimes, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
xmlbomb_20130219.patch christian.heimes, 2013-02-19 15:35 review
xmlbomb_20150518.patch martin.panter, 2015-05-18 02:59 Merged to 3.5 review
Pull Requests
URL Status Linked Edit
PR 9217 merged christian.heimes, 2018-09-12 16:26
PR 9265 open christian.heimes, 2018-09-13 17:19
PR 9511 merged christian.heimes, 2018-09-23 07:58
PR 9512 merged christian.heimes, 2018-09-23 08:02
Messages (23)
msg182393 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-02-19 15:35
Experimental fix for XML vulnerabilities against default. It's NOT ready and needs lots of polishing.

https://pypi.python.org/pypi/defusedxml contains explanations of all issues
https://pypi.python.org/pypi/defusedexpat is a standalone version of part of the patches for Python 2.6 to 3.3
msg184285 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-03-16 02:17
Since this has dragged on for quite a while, I'm probably just going to release 2.7.4 with a pointer to defusedxml in the release notes. (docs, though, perhaps)
msg184289 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-03-16 04:19
> Since this has dragged on for quite a while, I'm probably 
> just going to release 2.7.4 with a pointer to defusedxml
> in the release notes. (docs, though, perhaps)

+1
msg184387 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-17 16:46
> Since this has dragged on for quite a while, I'm probably just going to 
> release 2.7.4 with a pointer to defusedxml in the release notes. (docs, 
> though, perhaps)

+1 too.
msg185053 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-03-23 14:45
Not blocking 2.7.4 as discussed on mailing list.
msg243450 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-18 02:59
I did a rough merge with current “default” (3.5 pre-release) branch so that I can have a closer look at this issue; see xmlbomb_20150518.patch for the result. There are some bits with Argument Clinit that need perfecting:

* Unsure how to convert the ElementTree.XMLParser.__init__() signature (varied depending on XML_BOMB_PROTECTION compile-time flag) to Argument Clinic. So I just hard-coded it as if XML_BOMB_PROTECTION is always enabled. Why do we have to have a variable signature in the first place?

* New pyexpat functions need porting to Argument Clinic.
msg243469 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-18 10:44
I started looking at the lower Expat-level changes. Here are some thoughts, in the order that I thought them. :) But the end result is to investigate a different approach to disable entities in existing versions of Expat.

Currently, it looks like max_entity_indirections = 0 is a special value meaning no limit. I think it would be better to use some other value such as None for this, and then 0 could disable all entity expansion (other than pre-defined entities like & &#xNNNN; etc).

What is the benefit of having the indirection limit? I would have thought the entity expansion (character) limit on its own would already be effective at preventing nested expansion attacks like “billion laughs”. Even if the entity expanded to an empty string, all of the intermediate entity references are still included in the character count.

I wonder if it would make more sense to have a total character limit instead, which would include the characters from custom entity expansions as already counted by the patch, but also count characters directly from the XML body. Why would you want to avoid 8 million characters from entity expansion, but allow 8 million characters of plain XML (or gzipped XML)? (I am not an XML expert, so I could be missing something obvious here.)

Now I have discovered that it seems you can build Python to use an external Expat library, which won’t be affected by Christian’s fix (correct me if I am wrong). I think we should find a different solution that will also work with existing external Expat versions. Maybe setting EntityDeclHandler to raise an error would be good enough:

>>> from xml.parsers import expat
>>> bomb = '<!DOCTYPE bomb [\n<!ENTITY a "" >\n<!ENTITY b "' + '&a;' * 1000 + '" >\n<!ENTITY c "' + '&b;' * 1000 + '" >\n]>\n<bomb a="' + '&c;' * 10 + '" />\n'
>>> p = expat.ParserCreate()
>>> p.Parse(bomb, True)  # Noticeable delay (DOS) while parsing
1
>>> p = expat.ParserCreate()
>>> def handler(*so_much_argh):
...     raise ValueError("Entity handling disabled")
... 
>>> p.EntityDeclHandler = handler
>>> p.Parse(bomb, True)  # Instant failure (no DOS)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/build/python/src/Python-3.4.3/Modules/pyexpat.c", line 494, in EntityDecl
  File "<stdin>", line 2, in handler
ValueError: Entity handling disabled

This solution has been suggested and implemented elsewhere:
* https://bugzilla.redhat.com/show_bug.cgi?id=1000109#c1
* http://mail-archives.apache.org/mod_mbox/apr-dev/200906.mbox/%3C20090602162934.GA28360@redhat.com%3E (though I suspect the SetDefaultHandler option there is not sufficient)
msg243581 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-19 11:26
I have opened Issue 24238 with a patch for Element Tree that uses my EntityDeclHandler technique, instead of patching Expat. I would be interested in other people’s thoughts on the approach.
msg324416 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-08-31 12:24
This issue didn't get much attention in 5 years. The XML documentation starts with a big red warning:
https://docs.python.org/dev/library/xml.html

The warning is present in 2.7 and 3.4 as well:
https://docs.python.org/2.7/library/xml.html
https://docs.python.org/3.4/library/xml.html

It seems like XML is getting less popular because of JSON becoming more popular (JSON obviously comes with its own set of security issues). It seems like less core developers care about XML.

I suggest to:

* close bpo-17318 as a duplicate of this issue (bpo-17239)
* close bpo-24238
* close this issue

We just have to accept that core developers have limited availability and that documenting security issues is an acceptable tradeoff. I don't see any value of keeping these 3 issues open.
msg324685 - (view) Author: Matej Cepl (mcepl) * Date: 2018-09-06 11:46
> I suggest to:
> 
> * close bpo-17318 as a duplicate of this issue (bpo-17239)
> * close bpo-24238
> * close this issue

+1 from me.
msg325562 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-17 18:31
Ned - I don't think this is necessarily a release blocker, as we've been shipping it for a long time, but it would be nice if we can hold 3.7.1rc1 just long enough to get it in (provided Christian jumps in and says he'll get the last minor concerns on the PRs wrapped up very soon)
msg325573 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-09-17 20:33
We discussed this last week at the sprint.  Christian, it would be great if you could get this merged for 3.7 and possibly 3.6 in the next 24 hours.
msg325586 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-09-17 21:55
The external entity patch is ready, but the billion laughs fix need more time. I'm working with an upstream developer on a proper fix.
msg325590 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-17 22:27
Any reason to not take the current patch for our vendored copy and give it some exposure at least on platforms that rely on it (maybe just Windows)? I don't see any reason to wait on another group to "release" it when we need to manually apply the update to our own repo anyway.

Platforms using system libexpat that hasn't been patched have obviously decided not to patch it themselves :)
msg325595 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-17 23:11
> Any reason to not take the current patch for our vendored copy and give it some exposure at least on platforms that rely on it (maybe just Windows)? I don't see any reason to wait on another group to "release" it when we need to manually apply the update to our own repo anyway.

My policy is upstream fix: first, get a change merged upstream.

If we start with a downstream patch:

* only Windows and macOS will get the fix
* upstream may require changes making the change incompatible, for example change the default limits
* I would prefer to keep Modules/expat/ as close as possible to the upstream

Python is vulnerable for years, it's not like there is an urgency to fix it.
msg325610 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-09-18 04:08
There's also the view that it'll be easier to justify upstreaming a patch if it's been released and tested in a separate app. We require that all the time for Python patches, so why should we expect other projects to be different?

We're totally entitled to only release it for those platforms, because we are responsible for libexpat on those (we could vendor it for all of them? Or switch to platform-supported libraries for macOS and Windows?)

Who normally updates the vendored libexpat? I'd rather let them make the call on how far to diverge from upstream, since it'll be up to them to roll the changes forward or revert them in favour of upstream. I doubt different defaults will be an issue, especially since they aren't configurable anyway.
msg325642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-18 13:39
> Who normally updates the vendored libexpat?

I made the 3 latest libexpat updates, and each of them was painful :-)

My notes on vendored libraries:
https://pythondev.readthedocs.io/cpython.html#vendored-external-libraries

I wrote a tool to get the version of all vendored libraries, and a script to updated libexpat.
msg325648 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-09-18 14:14
> * only Windows and macOS will get the fix

Modules/expat can be used on all platforms. A downstream patch is only a problem for platforms that compile Python with "./configure --with-system-expat".

The security fixes for entity expansion blowup and external entity loading are backwards incompatible fixes. Technically they also violate XML standards. In practice the vast majority of users will never run into the issue, because external entities are scarcely used. The expat parser is a non-validating XML parser, so DTDs aren't useful at all. I'd rather break a handful of users than to keep the majority of users vulnerable.

To fix billion laughs and quadratic blowup once and for all, we also have to break backwards compatibility and require expat >= 2.3.0. For now the modules still work with old versions of expat. IMO it's fine. Vendors either have to update their libraries or use our copy of expat.

Ultimately it's Benjamin's, Larry's, and Ned's decision. They are release managers.
msg325702 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-19 06:09
On Tue, Sep 18, 2018, at 06:39, STINNER Victor wrote:
> 
> STINNER Victor <vstinner@redhat.com> added the comment:
> 
> > Who normally updates the vendored libexpat?
> 
> I made the 3 latest libexpat updates, and each of them was painful :-)

Oh? I've updated it twice (4e21100fa7bf66e0b32146d3f46ae16afc73fee1 and 5033aa77aacaa5505636f150e8d54baac5bdca9c), and it didn't seem so bad. I just copied the upstream files in. Did I do it wrong?
msg325738 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-19 09:35
> Oh? I've updated it twice (4e21100fa7bf66e0b32146d3f46ae16afc73fee1 and 5033aa77aacaa5505636f150e8d54baac5bdca9c), and it didn't seem so bad. I just copied the upstream files in. Did I do it wrong?

Let me remind what I did...

bpo-30694 (expat 2.2.1):

* I wrote a script to rebuild Modules/expat/ from the upstream code
* I had to manually keep our old pyexpatns.h file since it's a downstream change
* Then you have to add againt #include "pyexpatns.h" in Modules/expat/expat_external.h
* It broke buildbots: bpo-29591
* The change introduced a compilation warning: bpo-30797

bpo-30947 (expat 2.2.3):

* "If libexpat is upgraded in Python 2.7, the new Modules/expat/loadlibrary.c should also be added to PC/VS9.0/ project files, as I did for PCbuild."
* "Expat 2.2.3 has a bug: see bpo-31170 :-("
* etc.

There are different issues:

* We have some small downstream changes
* We still support VS 2008 for Python 2.7 whereas upstream doesn't care of this old legacy compiler
* Each release introduces its own set of bugs :-D
* Each release comes with its own set of new warnings...

At least for me, each update was painful. It's also painful to have to make the same change in all supported branches (2.7, 3.4, 3.5, 3.6, 3.7, master).
msg326144 - (view) Author: miss-islington (miss-islington) Date: 2018-09-23 07:50
New changeset 17b1d5d4e36aa57a9b25a0e694affbd1ee637e45 by Miss Islington (bot) (Christian Heimes) in branch 'master':
bpo-17239: Disable external entities in SAX parser (GH-9217)
https://github.com/python/cpython/commit/17b1d5d4e36aa57a9b25a0e694affbd1ee637e45
msg326228 - (view) Author: miss-islington (miss-islington) Date: 2018-09-24 12:38
New changeset 582d188e6e3487180891f1fc457a80dec8be26a8 by Miss Islington (bot) (Christian Heimes) in branch '3.6':
[3.6] bpo-17239: Disable external entities in SAX parser (GH-9217) (GH-9512)
https://github.com/python/cpython/commit/582d188e6e3487180891f1fc457a80dec8be26a8
msg326229 - (view) Author: miss-islington (miss-islington) Date: 2018-09-24 12:38
New changeset 394e55a9279d17240ef6fe85d3b4ea3fe7b6dff5 by Miss Islington (bot) (Christian Heimes) in branch '3.7':
[3.7] bpo-17239: Disable external entities in SAX parser (GH-9217) (GH-9511)
https://github.com/python/cpython/commit/394e55a9279d17240ef6fe85d3b4ea3fe7b6dff5
History
Date User Action Args
2022-04-11 14:57:42adminsetgithub: 61441
2021-11-08 16:56:41vstinnersetnosy: - vstinner
2021-11-04 14:08:01eryksunsetcomponents: + Library (Lib), XML
versions: + Python 3.7, Python 3.9
2021-11-04 14:02:21eryksunsetmessages: - msg405686
2021-11-04 14:01:59eryksunsetmessages: - msg405689
2021-11-04 14:01:03eryksunsetnosy: + barry, georg.brandl, rhettinger, pitrou, scoder, vstinner, larry, christian.heimes, benjamin.peterson, jwilk, ned.deily, mcepl, ezio.melotti, Arfrever, eli.bendersky, mitar, martin.panter, serhiy.storchaka, franck, steve.dower, rsandwick3, - ahmedsayeed1982
2021-11-04 12:08:57ahmedsayeed1982setmessages: + msg405689
components: + Extension Modules, - XML
versions: + Python 3.8, - Python 3.9
2021-11-04 12:08:07ahmedsayeed1982setversions: - Python 3.7, Python 3.8
nosy: + ahmedsayeed1982, - barry, georg.brandl, rhettinger, pitrou, scoder, vstinner, larry, christian.heimes, benjamin.peterson, jwilk, ned.deily, mcepl, ezio.melotti, Arfrever, eli.bendersky, mitar, martin.panter, serhiy.storchaka, franck, steve.dower, rsandwick3, miss-islington

messages: + msg405686

components: - Extension Modules, Library (Lib)
2021-10-20 12:52:47christian.heimeslinkissue17318 superseder
2020-02-04 12:15:53cheryl.sabellasetversions: + Python 3.9, - Python 2.7, Python 3.6
2019-09-03 08:52:40djcsetnosy: - djc
2019-06-28 17:57:20mitarsetnosy: + mitar
2018-09-24 12:38:40miss-islingtonsetmessages: + msg326229
2018-09-24 12:38:36miss-islingtonsetmessages: + msg326228
2018-09-23 08:02:49christian.heimessetpull_requests: + pull_request8918
2018-09-23 07:58:46christian.heimessetpull_requests: + pull_request8917
2018-09-23 07:50:37miss-islingtonsetnosy: + miss-islington
messages: + msg326144
2018-09-19 09:35:53vstinnersetmessages: + msg325738
2018-09-19 06:10:00benjamin.petersonsetmessages: + msg325702
2018-09-18 14:14:29christian.heimessetmessages: + msg325648
2018-09-18 13:39:19vstinnersetmessages: + msg325642
2018-09-18 04:08:25steve.dowersetmessages: + msg325610
2018-09-17 23:11:11vstinnersetmessages: + msg325595
2018-09-17 22:27:15steve.dowersetmessages: + msg325590
2018-09-17 21:55:12christian.heimessetnosy: + christian.heimes
messages: + msg325586
2018-09-17 20:33:45ned.deilysetmessages: + msg325573
2018-09-17 18:31:11steve.dowersetnosy: + steve.dower, ned.deily

messages: + msg325562
versions: + Python 3.6, Python 3.7, Python 3.8, - Python 2.6, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
2018-09-13 17:19:07christian.heimessetpull_requests: + pull_request8697
2018-09-12 16:26:27christian.heimessetstage: needs patch -> patch review
pull_requests: + pull_request8649
2018-09-06 11:46:56mceplsetmessages: + msg324685
2018-08-31 12:24:35vstinnersetnosy: + vstinner
messages: + msg324416
2018-03-05 00:46:11mceplsetnosy: + mcepl
2016-06-12 12:14:14christian.heimessetnosy: - christian.heimes
2016-06-12 12:11:24martin.pantersetdependencies: + xml.sax and xml.dom fetch DTDs by default, Avoid entity expansion attacks in Element Tree
2015-05-24 08:36:04scodersetnosy: + scoder
2015-05-19 11:26:52martin.pantersetmessages: + msg243581
2015-05-18 10:44:13martin.pantersetmessages: + msg243469
2015-05-18 03:00:00martin.pantersetfiles: + xmlbomb_20150518.patch

messages: + msg243450
versions: + Python 3.5
2015-01-14 23:49:33jwilksetnosy: + jwilk
2015-01-12 00:11:24martin.pantersetnosy: + martin.panter
2013-03-25 16:59:30rsandwick3setnosy: + rsandwick3
2013-03-23 14:45:10benjamin.petersonsetpriority: release blocker -> critical

messages: + msg185053
2013-03-17 16:46:08pitrousetnosy: + pitrou
messages: + msg184387
2013-03-16 04:19:02rhettingersetnosy: + rhettinger
messages: + msg184289
2013-03-16 02:17:49benjamin.petersonsetmessages: + msg184285
2013-02-22 23:40:35Arfreversetnosy: + Arfrever
2013-02-20 10:21:07djcsetnosy: + djc
2013-02-19 19:49:24serhiy.storchakasetnosy: + serhiy.storchaka
2013-02-19 17:26:29francksetnosy: + franck
2013-02-19 15:37:13ezio.melottisetnosy: + ezio.melotti, eli.bendersky
2013-02-19 15:35:41christian.heimescreate