diff --git a/Doc/library/markup.rst b/Doc/library/markup.rst --- a/Doc/library/markup.rst +++ b/Doc/library/markup.rst @@ -25,6 +25,7 @@ htmlparser.rst sgmllib.rst htmllib.rst + xml.rst xml.etree.elementtree.rst xml.dom.rst xml.dom.minidom.rst diff --git a/Doc/library/xml.dom.minidom.rst b/Doc/library/xml.dom.minidom.rst --- a/Doc/library/xml.dom.minidom.rst +++ b/Doc/library/xml.dom.minidom.rst @@ -20,6 +20,14 @@ not already proficient with the DOM should consider using the :mod:`xml.etree.ElementTree` module for their XML processing instead + +.. warning:: + + The :mod:`xml.dom.minidom` module is not secure against erroneous or + maliciously constructed data. If you need to parse untrusted or + unauthenticated data see :ref:`xml-vulnerabilities`. + + DOM applications typically start by parsing some XML into a DOM. With :mod:`xml.dom.minidom`, this is done through the parse functions:: diff --git a/Doc/library/xml.dom.pulldom.rst b/Doc/library/xml.dom.pulldom.rst --- a/Doc/library/xml.dom.pulldom.rst +++ b/Doc/library/xml.dom.pulldom.rst @@ -16,6 +16,13 @@ Object Model representation of a document from SAX events. +.. warning:: + + The :mod:`xml.dom.pulldom` module is not secure against erroneous or + maliciously constructed data. If you need to parse untrusted or + unauthenticated data see :ref:`xml-vulnerabilities`. + + .. class:: PullDOM([documentFactory]) :class:`xml.sax.handler.ContentHandler` implementation that ... diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -16,6 +16,14 @@ hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary. + +.. warning:: + + The :mod:`xml.etree.ElementTree` module is not secure against erroneous or + maliciously constructed data. If you need to parse untrusted or + unauthenticated data see :ref:`xml-vulnerabilities`. + + Each element has a number of properties associated with it: * a tag which is a string identifying what kind of data this element represents diff --git a/Doc/library/xml.rst b/Doc/library/xml.rst new file mode 100644 --- /dev/null +++ b/Doc/library/xml.rst @@ -0,0 +1,72 @@ +.. _xml: + +XML Processing Modules +====================== + +Python's interfaces for processing XML are grouped in the ``xml`` package. + +.. warning:: + + The XML modules are not secure against erroneous or maliciously + constructed data. If you need to parse untrusted or unauthenticated data see + :ref:`xml-vulnerabilities`. + +It is important to note that modules in the :mod:`xml` package require that +there be at least one SAX-compliant XML parser available. The Expat parser is +included with Python, so the :mod:`xml.parsers.expat` module will always be +available. + +The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the +definition of the Python bindings for the DOM and SAX interfaces. + +The XML handling submodules are: + +* :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight + +.. + +* :mod:`xml.dom`: the DOM API definition +* :mod:`xml.dom.minidom`: a lightweight DOM implementation +* :mod:`xml.dom.pulldom`: support for building partial DOM trees + +.. + +* :mod:`xml.sax`: SAX2 base classes and convenience functions +* :mod:`xml.parsers.expat`: the Expat parser binding + + +.. _xml-vulnerabilities: + +XML vulnerabilities +=================== + +The XML processing modules are not secure against erroneous or maliciously +constructed data. An attacker can abuse vulnerabilities for e.g. denial of +service attacks, to access local files or circumvent firewalls. The +documentation of `defusedxml`_ on PyPI explains all known attack vectors and +contains workarounds. `defusedexpat`_ provides a modified libexpat and +:mod:`pyexpat` module with countermeasures against entity expansion DoS +attacks. + +.. csv-table:: + :header: "kind", "sax", "etree", "minidom", "pulldom", "xmlrpc" + :widths: 30, 12, 12, 12, 12, 12 + :stub-columns: 0 + + "billion laughs", "**True**", "**True**", "**True**", "**True**", "**True**" + "quadratic blowup", "**True**", "**True**", "**True**", "**True**", "**True**" + "external entity expansion (remote)", "**True**", "False (1)", "False (2)", "**True**", "False (3)" + "external entity expansion (local file)", "**True**", "False (1)", "False (2)", "**True**", "False (3)" + "DTD retrieval", "**True**", "False", "False", "**True**", "False" + "gzip bomb", "False", "False", "False", "False", "**True**" + +1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a ParserError + when an entity occurs. +2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns + the unexpanded entity verbatim. +3. :mod:`xmlrpclib` doesn't expand external entities and omits them. + + +.. _defusedxml: + +.. _defusedexpat: diff --git a/Doc/library/xml.sax.handler.rst b/Doc/library/xml.sax.handler.rst --- a/Doc/library/xml.sax.handler.rst +++ b/Doc/library/xml.sax.handler.rst @@ -18,6 +18,13 @@ :mod:`xml.sax.handler`, so that all methods get default implementations. +.. warning:: + + The :mod:`xml.sax.handler` module is not secure against erroneous or + maliciously constructed data. If you need to parse untrusted or + unauthenticated data see :ref:`xml-vulnerabilities`. + + .. class:: ContentHandler This is the main callback interface in SAX, and the one most important to diff --git a/Doc/library/xml.sax.reader.rst b/Doc/library/xml.sax.reader.rst --- a/Doc/library/xml.sax.reader.rst +++ b/Doc/library/xml.sax.reader.rst @@ -16,6 +16,13 @@ a new parser object. +.. warning:: + + The :mod:`xml.sax.xmlreader` module is not secure against erroneous or + maliciously constructed data. If you need to parse untrusted or + unauthenticated data see :ref:`xml-vulnerabilities`. + + .. class:: XMLReader() Base class which can be inherited by SAX parsers. diff --git a/Doc/library/xml.sax.rst b/Doc/library/xml.sax.rst --- a/Doc/library/xml.sax.rst +++ b/Doc/library/xml.sax.rst @@ -16,6 +16,14 @@ SAX exceptions and the convenience functions which will be most used by users of the SAX API. + +.. warning:: + + The :mod:`xml.sax` module is not secure against erroneous or maliciously + constructed data. If you need to parse untrusted or unauthenticated data see + :ref:`xml-vulnerabilities`. + + The convenience functions are: diff --git a/Doc/library/xmlrpclib.rst b/Doc/library/xmlrpclib.rst --- a/Doc/library/xmlrpclib.rst +++ b/Doc/library/xmlrpclib.rst @@ -28,6 +28,13 @@ between conformable Python objects and XML on the wire. +.. warning:: + + The :mod:`xmlrpclib` module is not secure against erroneous or maliciously + constructed data. If you need to parse untrusted or unauthenticated data see + :ref:`xml-vulnerabilities`. + + .. class:: ServerProxy(uri[, transport[, encoding[, verbose[, allow_none[, use_datetime]]]]]) A :class:`ServerProxy` instance is an object that manages communication with a