This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 2.7.15: xml.sax.parse() closes file objects passed to it
Type: Stage: resolved
Components: Library (Lib), XML Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: gibfahn, serhiy.storchaka, vstinner, zach.ware
Priority: normal Keywords:

Created on 2018-06-01 12:59 by gibfahn, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg318408 - (view) Author: Gibson Fahnestock (gibfahn) Date: 2018-06-01 12:59
Sorry if this is a duplicate, I didn't find anything.

We hit some issues with this change:
- Python Bug: https://bugs.python.org/issue30264
- Github PRs: https://github.com/python/cpython/pull/1451 and https://github.com/python/cpython/pull/1476

It's possible I'm misunderstanding something, let me know if that's the case.

It seems that in Python 2.7.15, xml.sax.parse() closes file descriptors that are passed to it.

1. Isn't this a breaking change? It certainly breaks code we're using in production.
2. Why is the sax parser closing file descriptors that it didn't open? I understand if the parser is given a path and opens its own fd it makes sense to close it, but not when the fd is given directly.
3. What do you do if you need access to the file descriptor after parsing it (because you parse it in place)?

For file descriptors that point to files on disk we can work around it by reopening the file after each parse, but for something like a StringIO buffer (see simplified example below) I'm not aware of any way to get around the problem.

-> StringIO Example:

    import xml.sax
    import StringIO
    # Some StringIO buffer.
    fd = StringIO.StringIO(b'<_/>')
    # Do some parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())
    # Try to do some other parsing (fails).
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())

-> File Example:

    import xml.sax
    fd = open('/tmp/test-junit1.xml')
    # Do some parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())
    # Do some other parsing.
    xml.sax.parse(fd, xml.sax.handler.ContentHandler())


Originally posted on https://github.com/python/cpython/pull/1451#issuecomment-393837538, thanks serhiy.storchaka for redirecting me here.
msg318419 - (view) Author: Gibson Fahnestock (gibfahn) Date: 2018-06-01 14:38
As an addendum, I note that other parsers, like:

    parser = lxml.etree.XMLParser(compact=False)
    etree.parse(some_fd, parser).find('some_text').text

do not close the fd they are given.
msg367373 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-04-27 03:43
Hi Gibson,

I'm sorry this issue didn't get any attention before Python 2.7 reached EOL, but as that milestone has now passed I'm closing the issue.  Thank you for the report anyway!
History
Date User Action Args
2022-04-11 14:59:01adminsetgithub: 77913
2020-04-27 03:43:10zach.waresetstatus: open -> closed

nosy: + zach.ware
messages: + msg367373

resolution: out of date
stage: resolved
2018-06-01 14:38:38gibfahnsetmessages: + msg318419
2018-06-01 12:59:58gibfahncreate