msg292946 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-03 23:50 |
Running test_sax of Python 2.7 on Windows emits the following warning:
Warning -- files was modified by test_sax
The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) fails because there is still an open file object somewhere.
The bug is in the test_parse_bytes() of test_sax, on check_parse(TESTFN) which raises an exception as expected.
xml.sax.parse() should close the parser on exception.
On master, test_sax explicitly expects a ResourceWarning, WTF?
with support.check_warnings(('unclosed file', ResourceWarning)):
# XXX Failed parser leaks an opened file.
with self.assertRaises(SAXException):
self.check_parse(TESTFN)
# Collect leaked file.
gc.collect()
See also issue #15388.
|
msg292947 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-03 23:57 |
https://github.com/python/cpython/pull/1444 is written for the master branch. IMHO the bug must be fixed in all supported branches.
The question is if all parser provide a close() method?
|
msg292948 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-03 23:59 |
> The question is if all parser provide a close() method?
It seems like xml.sax supports pluggable parsers, at least using the PY_SAX_PARSER environment variable. So it would be safer to add a cheap "if hasattr(parser, 'close'):" test;
|
msg292950 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-04 00:03 |
> The problem is that os.unlink(TESTFN) ignores all OSError: os.unlink(TESTFN) fails because there is still an open file object somewhere.
I created the issue #30265: test.support.unlink() should fail or emit a warning on WindowsError: [Error 32] The process cannot access the file ....
|
msg292973 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-04 11:18 |
I wrote a different change which doesn't use the parser anymore, since it seems like xml.sax allows to plug third-party parsers. My new PR 1451 closes directly the file or urllib object, instead of touching the parser (which can have more visible side effect).
@Serhiy: On my first PR 1444, you wrote:
"Add a Misc/NEWS entry. This change changes behavior. It can change visible exception."
Do you consider that a NEWS entry is needed for my new PR 1451? I plan to backport the change to 2.7, 3.5 and 3.6.
|
msg293007 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-05-04 19:13 |
What if the third-party parser don't use prepare_input_source()? It can use more efficient way if pass just a file name.
Wouldn't be better to move your code into the parser's method parse()? If a file is opened inside the parse() method and is not exposed outside, that method should close it.
|
msg293021 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-04 22:08 |
> What if the third-party parser don't use prepare_input_source()? It can use more efficient way if pass just a file name.
About "third-party parsers": I have no idea of what are these parsers. It seems like Jython uses provides a parser. But I'm not interested to test Jython, sorry!
> Wouldn't be better to move your code into the parser's method parse()? If a file is opened inside the parse() method and is not exposed outside, that method should close it.
I wrote a first patch putting the try/except into expatparser code, but then I found a second parser with a parse() method: IncrementalParser. In fact, the expat parser inherits from IncrementalParser. So I only modified IncrementalParser, and IncrementalParser calls the abstract method close(). So any parser implemented on top of IncrementalParser should get the fix for free.
My 3rd attempt (patch IncrementalParser.parse()) changes less code and IMHO is less error-prone.
Now, *in practice*, only the expat parser is used in CPython, and according to unit tests, my fix closes the file, so the bug is fixed!
|
msg293040 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-05-05 04:36 |
As you said, calling close() can have side effects (for example invoking self._cont_handler.endDocument()). This was the argument against PR 1444.
It seems to me that if _entity_stack is not empty (this happens in case of error in entity parsing) the close() method does nothing. And maybe there are other leaks in entity parsing.
|
msg293051 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 07:13 |
Serhiy: "As you said, calling close() can have side effects (for example invoking self._cont_handler.endDocument()). This was the argument against PR 1444. It seems to me that if _entity_stack is not empty (this happens in case of error in entity parsing) the close() method does nothing. And maybe there are other leaks in entity parsing."
Right, it seems that the conclusion is that there is not only a high risk of regression because of unexpected side effects, but also an issue with unknown external parsers.
So finally I moved backward and proposed my change on ExpatParser.parse():
* the change has a narrow scope: only ExpatParser.parse() is modified, external unknown code is not impact, low risk of regression for external parsers.
* the change is well defined: it does exactly one thing, it makes sure that the source is closed, especially the inner file object or urllib object. My change makes sure that the sure is always closed, even if the close() method does nothing.
According to the long list of constraints for this fix, I don't think that we can do better.
I made one design choice: the close() method of byte and character streams are called twice, I didn't call setByteStream(None) / setCharacterStream(None) to avoid creating an inconsistent source. If you really care, maybe we can reset the source to a new xmlreader.InputSource() object instead.
|
msg293054 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 07:32 |
> I made one design choice: the close() method of byte and character streams are called twice, I didn't call setByteStream(None) / setCharacterStream(None) to avoid creating an inconsistent source. If you really care, maybe we can reset the source to a new xmlreader.InputSource() object instead.
Hum, I just modified my PR one more time to replace try/finally: self._close_source() with "try/except: self._close_source(); raise", so file.close() is now only called once.
|
msg293063 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 07:46 |
New changeset ef9c0e732fc50aefbdd7c5a80e04e14b31684e66 by Victor Stinner in branch 'master':
bpo-30264: ExpatParser closes the source on error (#1451)
https://github.com/python/cpython/commit/ef9c0e732fc50aefbdd7c5a80e04e14b31684e66
|
msg293066 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 08:04 |
Oh, the fix for Python 2.7 is different. In fact, the file object was never explicitly closed by the parser. It is now always closed by the parser, on success or error.
|
msg293068 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 08:04 |
New changeset 0fe870f3f95ba883b2b06bc0d814bdab8d53df98 by Victor Stinner in branch '3.6':
bpo-30264: ExpatParser closes the source on error (#1451) (#1474)
https://github.com/python/cpython/commit/0fe870f3f95ba883b2b06bc0d814bdab8d53df98
|
msg293069 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 08:08 |
New changeset 0c9aa6ffd318c04ce23997b4704477d4a4d82829 by Victor Stinner in branch '3.5':
bpo-30264: ExpatParser closes the source on error (#1451) (#1475)
https://github.com/python/cpython/commit/0c9aa6ffd318c04ce23997b4704477d4a4d82829
|
msg293071 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 08:11 |
New changeset d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a by Victor Stinner in branch '2.7':
bpo-30264: ExpatParser now closes the source (#1476)
https://github.com/python/cpython/commit/d81f9e24ea89c0aaded1e0d3f8d8076bbd58c19a
|
msg293080 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 08:31 |
I applied my fix to 2.7, 3.5, 3.6 and master (3.7). Thanks for the helpful reviews Serhiy ;-)
|
msg293084 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-05-05 09:34 |
This changes the behavior if pass a file object to the parser. When the parser failed it was possible to use the passed file object (for example call tell()). Now it is closed.
|
msg293088 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-05-05 09:45 |
> This changes the behavior if pass a file object to the parser. When the parser failed it was possible to use the passed file object (for example call tell()). Now it is closed.
If the high-level *function* parse() is called with a filename, the caller doesn't have access to the file object nor the parser.
If you use directly the parser class and pass an open file object, in that case, yes, my change closes the file.
If you want to keep the old behaviour for that case, we can change the code to only close the source if source is not a string (not isinstance(source, str)) in the ExpatParser.parse() method. Since the caller owns the file object, (s)he is responsible to close it.
Note: on parser success, the source is always closed, even if the user pass an already open file object.
I'm not convince that using the file object on parser error and using directly the ExpatParser class is a common use case. I expect that the XML parser reads more data than it needs (read ahead for speed), so I don't think that file.tell() will you exactly the file position where the XML parser failed.
|
msg293096 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-05-05 10:46 |
> If you use directly the parser class and pass an open file object, in that case, yes, my change closes the file.
Or if the high-level *function* parse() is called with an open file object.
I don't know whether this is desirable change, but this change can break third-party code, therefore it should be documented in Misc/NEWS.
And it is worth to add a check in test_parse_bytes that the passed file object is closed after error.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:46 | admin | set | github: 74450 |
2017-05-05 10:46:50 | serhiy.storchaka | set | messages:
+ msg293096 |
2017-05-05 09:45:01 | vstinner | set | messages:
+ msg293088 |
2017-05-05 09:34:52 | serhiy.storchaka | set | messages:
+ msg293084 |
2017-05-05 08:31:42 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg293080
stage: patch review -> resolved |
2017-05-05 08:11:57 | vstinner | set | messages:
+ msg293071 |
2017-05-05 08:08:07 | vstinner | set | messages:
+ msg293069 |
2017-05-05 08:04:59 | vstinner | set | messages:
+ msg293068 |
2017-05-05 08:04:05 | vstinner | set | messages:
+ msg293066 |
2017-05-05 08:03:14 | vstinner | set | pull_requests:
+ pull_request1575 |
2017-05-05 07:55:28 | vstinner | set | pull_requests:
+ pull_request1574 |
2017-05-05 07:50:54 | vstinner | set | pull_requests:
+ pull_request1573 |
2017-05-05 07:46:49 | vstinner | set | messages:
+ msg293063 |
2017-05-05 07:32:39 | vstinner | set | messages:
+ msg293054 |
2017-05-05 07:13:13 | vstinner | set | messages:
+ msg293051 |
2017-05-05 04:36:36 | serhiy.storchaka | set | messages:
+ msg293040 |
2017-05-04 22:08:19 | vstinner | set | messages:
+ msg293021 |
2017-05-04 19:13:36 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka, eli.bendersky, scoder messages:
+ msg293007
components:
+ Library (Lib), - Tests type: behavior stage: patch review |
2017-05-04 11:18:17 | vstinner | set | messages:
+ msg292973 |
2017-05-04 11:14:42 | vstinner | set | pull_requests:
+ pull_request1549 |
2017-05-04 00:03:42 | vstinner | set | messages:
+ msg292950 |
2017-05-03 23:59:10 | vstinner | set | messages:
+ msg292948 |
2017-05-03 23:57:02 | vstinner | set | messages:
+ msg292947 versions:
+ Python 3.5, Python 3.6, Python 3.7 |
2017-05-03 23:55:53 | vstinner | set | pull_requests:
+ pull_request1542 |
2017-05-03 23:50:04 | vstinner | create | |