This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [Windows] test_sax: Warning -- files was modified by test_sax
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, scoder, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-05-03 23:50 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1444 closed vstinner, 2017-05-03 23:55
PR 1451 merged vstinner, 2017-05-04 11:14
PR 1474 merged vstinner, 2017-05-05 07:50
PR 1475 merged vstinner, 2017-05-05 07:55
PR 1476 merged vstinner, 2017-05-05 08:03
Messages (19)
msg292946 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-03 23:50
Running test_sax of Python 2.7 on Windows emits the following warning:

Warning -- files was modified by test_sax

The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) fails because there is still an open file object somewhere.

The bug is in the test_parse_bytes() of test_sax, on check_parse(TESTFN) which raises an exception as expected.

xml.sax.parse() should close the parser on exception.


On master, test_sax explicitly expects a ResourceWarning, WTF?

        with support.check_warnings(('unclosed file', ResourceWarning)):                           
            # XXX Failed parser leaks an opened file.                                              
            with self.assertRaises(SAXException):                                                  
                self.check_parse(TESTFN)                                                           
            # Collect leaked file.                                                                 
            gc.collect()


See also issue #15388.
msg292947 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-03 23:57
https://github.com/python/cpython/pull/1444 is written for the master branch. IMHO the bug must be fixed in all supported branches.

The question is if all parser provide a close() method?
msg292948 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-03 23:59
> The question is if all parser provide a close() method?

It seems like xml.sax supports pluggable parsers, at least using the PY_SAX_PARSER environment variable. So it would be safer to add a cheap "if hasattr(parser, 'close'):" test;
msg292950 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 00:03
> The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) fails because there is still an open file object somewhere.

I created the issue #30265: test.support.unlink() should fail or emit a warning on WindowsError: [Error 32] The process cannot access the file ....
msg292973 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 11:18
I wrote a different change which doesn't use the parser anymore, since it seems like xml.sax allows to plug third-party parsers. My new PR 1451 closes directly the file or urllib object, instead of touching the parser (which can have more visible side effect).

@Serhiy: On my first PR 1444, you wrote:
"Add a Misc/NEWS entry. This change changes behavior. It can change visible exception."

Do you consider that a NEWS entry is needed for my new PR 1451? I plan to backport the change to 2.7, 3.5 and 3.6.
msg293007 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-04 19:13
What if the third-party parser don't use prepare_input_source()? It can use more efficient way if pass just a file name.

Wouldn't be better to move your code into the parser's method parse()? If a file is opened inside the parse() method and is not exposed outside, that method should close it.
msg293021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-04 22:08
> What if the third-party parser don't use prepare_input_source()? It can use more efficient way if pass just a file name.

About "third-party parsers": I have no idea of what are these parsers. It seems like Jython uses provides a parser. But I'm not interested to test Jython, sorry!

> Wouldn't be better to move your code into the parser's method parse()? If a file is opened inside the parse() method and is not exposed outside, that method should close it.

I wrote a first patch putting the try/except into expatparser code, but then I found a second parser with a parse() method: IncrementalParser. In fact, the expat parser inherits from IncrementalParser. So I only modified IncrementalParser, and IncrementalParser calls the abstract method close(). So any parser implemented on top of IncrementalParser should get the fix for free.

My 3rd attempt (patch IncrementalParser.parse()) changes less code and IMHO is less error-prone.

Now, *in practice*, only the expat parser is used in CPython, and according to unit tests, my fix closes the file, so the bug is fixed!
msg293040 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-05 04:36
As you said, calling close() can have side effects (for example invoking self._cont_handler.endDocument()). This was the argument against PR 1444.

It seems to me that if _entity_stack is not empty (this happens in case of error in entity parsing) the close() method does nothing. And maybe there are other leaks in entity parsing.
msg293051 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 07:13
Serhiy: "As you said, calling close() can have side effects (for example invoking self._cont_handler.endDocument()). This was the argument against PR 1444. It seems to me that if _entity_stack is not empty (this happens in case of error in entity parsing) the close() method does nothing. And maybe there are other leaks in entity parsing."

Right, it seems that the conclusion is that there is not only a high risk of regression because of unexpected side effects, but also an issue with unknown external parsers.

So finally I moved backward and proposed my change on ExpatParser.parse():

* the change has a narrow scope: only ExpatParser.parse() is modified, external unknown code is not impact, low risk of regression for external parsers.

* the change is well defined: it does exactly one thing, it makes sure that the source is closed, especially the inner file object or urllib object. My change makes sure that the sure is always closed, even if the close() method does nothing.

According to the long list of constraints for this fix, I don't think that we can do better.

I made one design choice: the close() method of byte and character streams are called twice, I didn't call setByteStream(None) / setCharacterStream(None) to avoid creating an inconsistent source. If you really care, maybe we can reset the source to a new xmlreader.InputSource() object instead.
msg293054 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 07:32
> I made one design choice: the close() method of byte and character streams are called twice, I didn't call setByteStream(None) / setCharacterStream(None) to avoid creating an inconsistent source. If you really care, maybe we can reset the source to a new xmlreader.InputSource() object instead.

Hum, I just modified my PR one more time to replace try/finally: self._close_source() with "try/except: self._close_source(); raise", so file.close() is now only called once.
msg293063 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 07:46
New changeset ef9c0e732fc50aefbdd7c5a80e04e14b31684e66 by Victor Stinner in branch 'master':
bpo-30264: ExpatParser closes the source on error (#1451)
https://github.com/python/cpython/commit/ef9c0e732fc50aefbdd7c5a80e04e14b31684e66
msg293066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 08:04
Oh, the fix for Python 2.7 is different. In fact, the file object was never explicitly closed by the parser. It is now always closed by the parser, on success or error.
msg293068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 08:04
New changeset 0fe870f3f95ba883b2b06bc0d814bdab8d53df98 by Victor Stinner in branch '3.6':
bpo-30264: ExpatParser closes the source on error (#1451) (#1474)
https://github.com/python/cpython/commit/0fe870f3f95ba883b2b06bc0d814bdab8d53df98
msg293069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 08:08
New changeset 0c9aa6ffd318c04ce23997b4704477d4a4d82829 by Victor Stinner in branch '3.5':
bpo-30264: ExpatParser closes the source on error (#1451) (#1475)
https://github.com/python/cpython/commit/0c9aa6ffd318c04ce23997b4704477d4a4d82829
msg293071 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 08:11
New changeset d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a by Victor Stinner in branch '2.7':
bpo-30264: ExpatParser now closes the source (#1476)
https://github.com/python/cpython/commit/d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a
msg293080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 08:31
I applied my fix to 2.7, 3.5, 3.6 and master (3.7). Thanks for the helpful reviews Serhiy ;-)
msg293084 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-05 09:34
This changes the behavior if pass a file object to the parser. When the parser failed it was possible to use the passed file object (for example call tell()). Now it is closed.
msg293088 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-05 09:45
> This changes the behavior if pass a file object to the parser. When the parser failed it was possible to use the passed file object (for example call tell()). Now it is closed.

If the high-level *function* parse() is called with a filename, the caller doesn't have access to the file object nor the parser.

If you use directly the parser class and pass an open file object, in that case, yes, my change closes the file.

If you want to keep the old behaviour for that case, we can change the code to only close the source if source is not a string (not isinstance(source, str)) in the ExpatParser.parse() method. Since the caller owns the file object, (s)he is responsible to close it.

Note: on parser success, the source is always closed, even if the user pass an already open file object.

I'm not convince that using the file object on parser error and using directly the ExpatParser class is a common use case. I expect that the XML parser reads more data than it needs (read ahead for speed), so I don't think that file.tell() will you exactly the file position where the XML parser failed.
msg293096 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-05 10:46
> If you use directly the parser class and pass an open file object, in that case, yes, my change closes the file.

Or if the high-level *function* parse() is called with an open file object.

I don't know whether this is desirable change, but this change can break third-party code, therefore it should be documented in Misc/NEWS.

And it is worth to add a check in test_parse_bytes that the passed file object is closed after error.
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74450
2017-05-05 10:46:50serhiy.storchakasetmessages: + msg293096
2017-05-05 09:45:01vstinnersetmessages: + msg293088
2017-05-05 09:34:52serhiy.storchakasetmessages: + msg293084
2017-05-05 08:31:42vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg293080

stage: patch review -> resolved
2017-05-05 08:11:57vstinnersetmessages: + msg293071
2017-05-05 08:08:07vstinnersetmessages: + msg293069
2017-05-05 08:04:59vstinnersetmessages: + msg293068
2017-05-05 08:04:05vstinnersetmessages: + msg293066
2017-05-05 08:03:14vstinnersetpull_requests: + pull_request1575
2017-05-05 07:55:28vstinnersetpull_requests: + pull_request1574
2017-05-05 07:50:54vstinnersetpull_requests: + pull_request1573
2017-05-05 07:46:49vstinnersetmessages: + msg293063
2017-05-05 07:32:39vstinnersetmessages: + msg293054
2017-05-05 07:13:13vstinnersetmessages: + msg293051
2017-05-05 04:36:36serhiy.storchakasetmessages: + msg293040
2017-05-04 22:08:19vstinnersetmessages: + msg293021
2017-05-04 19:13:36serhiy.storchakasetnosy: + serhiy.storchaka, eli.bendersky, scoder
messages: + msg293007

components: + Library (Lib), - Tests
type: behavior
stage: patch review
2017-05-04 11:18:17vstinnersetmessages: + msg292973
2017-05-04 11:14:42vstinnersetpull_requests: + pull_request1549
2017-05-04 00:03:42vstinnersetmessages: + msg292950
2017-05-03 23:59:10vstinnersetmessages: + msg292948
2017-05-03 23:57:02vstinnersetmessages: + msg292947
versions: + Python 3.5, Python 3.6, Python 3.7
2017-05-03 23:55:53vstinnersetpull_requests: + pull_request1542
2017-05-03 23:50:04vstinnercreate