classification
Title: Python 2.7.15: xml.sax.parse() closes file objects passed to it
Type: Stage:
Components: Library (Lib), XML Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gibfahn, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2018-06-01 12:59 by gibfahn, last changed 2018-06-01 14:38 by gibfahn.

Messages (2)
msg318408 - (view) Author: Gibson Fahnestock (gibfahn) Date: 2018-06-01 12:59
Sorry if this is a duplicate, I didn't find anything.

We hit some issues with this change:
- Python Bug: https://bugs.python.org/issue30264
- Github PRs: https://github.com/python/cpython/pull/1451 and https://github.com/python/cpython/pull/1476

It's possible I'm misunderstanding something, let me know if that's the case.

It seems that in Python 2.7.15, xml.sax.parse() closes file descriptors that are passed to it.

1. Isn't this a breaking change? It certainly breaks code we're using in production.
2. Why is the sax parser closing file descriptors that it didn't open? I understand if the parser is given a path and opens its own fd it makes sense to close it, but not when the fd is given directly.
3. What do you do if you need access to the file descriptor after parsing it (because you parse it in place)?

For file descriptors that point to files on disk we can work around it by reopening the file after each parse, but for something like a StringIO buffer (see simplified example below) I'm not aware of any way to get around the problem.

-> StringIO Example:

    import xml.sax
    import StringIO
    # Some StringIO buffer.
    fd = StringIO.StringIO(b'<_/>')
    # Do some parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())
    # Try to do some other parsing (fails).
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())

-> File Example:

    import xml.sax
    fd = open('/tmp/test-junit1.xml')
    # Do some parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())
    # Do some other parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())


Originally posted on https://github.com/python/cpython/pull/1451#issuecomment-393837538, thanks serhiy.storchaka for redirecting me here.
msg318419 - (view) Author: Gibson Fahnestock (gibfahn) Date: 2018-06-01 14:38
As an addendum, I note that other parsers, like:

    parser = lxml.etree.XMLParser(compact=False)
    etree.parse(some_fd, parser).find('some_text').text

do not close the fd they are given.
History
Date User Action Args
2018-06-01 14:38:38gibfahnsetmessages: + msg318419
2018-06-01 12:59:58gibfahncreate