Author martin.panter
Recipients Arfrever, barry, benjamin.peterson, christian.heimes, djc, eli.bendersky, ezio.melotti, franck, georg.brandl, jwilk, larry, martin.panter, pitrou, rhettinger, rsandwick3, serhiy.storchaka
Date 2015-05-18.10:44:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1431945853.17.0.558584725865.issue17239@psf.upfronthosting.co.za>
In-reply-to
Content
I started looking at the lower Expat-level changes. Here are some thoughts, in the order that I thought them. :) But the end result is to investigate a different approach to disable entities in existing versions of Expat.

Currently, it looks like max_entity_indirections = 0 is a special value meaning no limit. I think it would be better to use some other value such as None for this, and then 0 could disable all entity expansion (other than pre-defined entities like &amp; &#xNNNN; etc).

What is the benefit of having the indirection limit? I would have thought the entity expansion (character) limit on its own would already be effective at preventing nested expansion attacks like “billion laughs”. Even if the entity expanded to an empty string, all of the intermediate entity references are still included in the character count.

I wonder if it would make more sense to have a total character limit instead, which would include the characters from custom entity expansions as already counted by the patch, but also count characters directly from the XML body. Why would you want to avoid 8 million characters from entity expansion, but allow 8 million characters of plain XML (or gzipped XML)? (I am not an XML expert, so I could be missing something obvious here.)

Now I have discovered that it seems you can build Python to use an external Expat library, which won’t be affected by Christian’s fix (correct me if I am wrong). I think we should find a different solution that will also work with existing external Expat versions. Maybe setting EntityDeclHandler to raise an error would be good enough:

>>> from xml.parsers import expat
>>> bomb = '<!DOCTYPE bomb [\n<!ENTITY a "" >\n<!ENTITY b "' + '&a;' * 1000 + '" >\n<!ENTITY c "' + '&b;' * 1000 + '" >\n]>\n<bomb a="' + '&c;' * 10 + '" />\n'
>>> p = expat.ParserCreate()
>>> p.Parse(bomb, True)  # Noticeable delay (DOS) while parsing
1
>>> p = expat.ParserCreate()
>>> def handler(*so_much_argh):
...     raise ValueError("Entity handling disabled")
... 
>>> p.EntityDeclHandler = handler
>>> p.Parse(bomb, True)  # Instant failure (no DOS)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/build/python/src/Python-3.4.3/Modules/pyexpat.c", line 494, in EntityDecl
  File "<stdin>", line 2, in handler
ValueError: Entity handling disabled

This solution has been suggested and implemented elsewhere:
* https://bugzilla.redhat.com/show_bug.cgi?id=1000109#c1
* http://mail-archives.apache.org/mod_mbox/apr-dev/200906.mbox/%3C20090602162934.GA28360@redhat.com%3E (though I suspect the SetDefaultHandler option there is not sufficient)
History
Date User Action Args
2015-05-18 10:44:13martin.pantersetrecipients: + martin.panter, barry, georg.brandl, rhettinger, pitrou, larry, christian.heimes, benjamin.peterson, jwilk, djc, ezio.melotti, Arfrever, eli.bendersky, serhiy.storchaka, franck, rsandwick3
2015-05-18 10:44:13martin.pantersetmessageid: <1431945853.17.0.558584725865.issue17239@psf.upfronthosting.co.za>
2015-05-18 10:44:13martin.panterlinkissue17239 messages
2015-05-18 10:44:12martin.pantercreate