classification
Title: ElementTree.XMLParser() mishandles exceptions
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Oren Milman, eli.bendersky, scoder, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017-09-13 17:28 by scoder, last changed 2017-10-14 06:58 by scoder.

Pull Requests
URL Status Linked Edit
PR 3545 merged scoder, 2017-09-13 17:29
PR 3585 merged python-dev, 2017-09-14 20:00
PR 3992 open scoder, 2017-10-14 06:57
Messages (9)
msg302101 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-13 17:28
The "XMLParser.__init__()" method in "_elementtree.c" contains this code:

    self->handle_start = PyObject_GetAttrString(target, "start");
    self->handle_data = PyObject_GetAttrString(target, "data");
    self->handle_end = PyObject_GetAttrString(target, "end");
    self->handle_comment = PyObject_GetAttrString(target, "comment");
    self->handle_pi = PyObject_GetAttrString(target, "pi");
    self->handle_close = PyObject_GetAttrString(target, "close");
    self->handle_doctype = PyObject_GetAttrString(target, "doctype");
    PyErr_Clear();

This ignores all exceptions, not only AttributeError.
It also passes live exceptions into the later lookup calls, which may execute arbitrary user code.
msg302103 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-13 18:20
Oh, I see many other issues with C implementation of ElementTree.

1. If XMLParser.__init__ is called twice, it leaks references and the Expat parser.

Possible solution: use the Py_XSETREF() macro instead of simple assignment. The Expat parser needs special handling.

Other possible solution: drop __init__() and make all initialization in __new__(). But this is a solution only for 3.7 because can break compatibility.

2. If XMLParser.__init__ is not called or if it fails to initialize the Expat parser, self->entity and self->target are NULL. Later in xmlparser_getattro() they are increfed unconditionally.
msg302107 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-13 19:13
Feel free to provide a separate pull request. These issues seem independent of the exception handling problem that I wrote a fix for.
msg302169 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 12:11
I agree. While these issues are not totally independent (they changes the same code), my issue needs uncommon use of __new__ and __init__, while your issue can be exposed in normal cases.
msg302205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 20:00
New changeset c8d8e15bfc24abeeaaf3d8be9073276b0c011cdf by Serhiy Storchaka (scoder) in branch 'master':
bpo-31455: Fix an assertion failure in ElementTree.XMLParser(). (#3545)
https://github.com/python/cpython/commit/c8d8e15bfc24abeeaaf3d8be9073276b0c011cdf
msg302224 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 23:13
New changeset 49caab46f687eb201898fb6c2c40d47bdcb0e58b by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-31455: Fix an assertion failure in ElementTree.XMLParser(). (GH-3545) (#3585)
https://github.com/python/cpython/commit/49caab46f687eb201898fb6c2c40d47bdcb0e58b
msg304324 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-10-13 08:05
Serhiy, in addition to the problems you mentioned with not calling __init__(), it seems
that calling every method of an uninitialized XMLParser object would crash.

If you don't mind, i would be happy to open an issue to fix these crashes.
msg304338 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-13 13:51
It would be nice. But I see you already have opened issue31758 for reference leaks. I think that other problems can be solved in the same issue.

Do you mind to backport your patch to 2.7 Stefan? If this makes sense. Otherwise I'll just close this issue. Live exceptions don't cause crash in 2.7.
msg304383 - (view) Author: Stefan Behnel (scoder) * Date: 2017-10-14 06:58
Backport PR for 2.7 added: 3992
History
Date User Action Args
2017-10-14 06:58:54scodersetmessages: + msg304383
2017-10-14 06:57:38scodersetpull_requests: + pull_request3968
2017-10-13 13:51:06serhiy.storchakasetmessages: + msg304338
2017-10-13 08:05:04Oren Milmansetnosy: + Oren Milman
messages: + msg304324
2017-09-14 23:13:23serhiy.storchakasetmessages: + msg302224
2017-09-14 20:00:15python-devsetpull_requests: + pull_request3576
2017-09-14 20:00:05serhiy.storchakasetmessages: + msg302205
2017-09-14 12:11:56serhiy.storchakasetmessages: + msg302169
2017-09-13 19:13:53scodersetmessages: + msg302107
2017-09-13 18:20:42serhiy.storchakasetmessages: + msg302103
2017-09-13 17:33:04serhiy.storchakasetnosy: + vstinner, eli.bendersky, serhiy.storchaka

versions: + Python 2.7, Python 3.6
2017-09-13 17:29:13scodersetkeywords: + patch
stage: patch review
pull_requests: + pull_request3542
2017-09-13 17:28:39scodercreate