This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ElementTree.XMLParser() mishandles exceptions
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Oren Milman, eli.bendersky, scoder, serhiy.storchaka, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2017-09-13 17:28 by scoder, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3545 merged scoder, 2017-09-13 17:29
PR 3585 merged python-dev, 2017-09-14 20:00
PR 3992 merged scoder, 2017-10-14 06:57
PR 6318 merged scoder, 2018-03-31 10:07
Messages (14)
msg302101 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-09-13 17:28
The "XMLParser.__init__()" method in "_elementtree.c" contains this code:

    self->handle_start = PyObject_GetAttrString(target, "start");
    self->handle_data = PyObject_GetAttrString(target, "data");
    self->handle_end = PyObject_GetAttrString(target, "end");
    self->handle_comment = PyObject_GetAttrString(target, "comment");
    self->handle_pi = PyObject_GetAttrString(target, "pi");
    self->handle_close = PyObject_GetAttrString(target, "close");
    self->handle_doctype = PyObject_GetAttrString(target, "doctype");
    PyErr_Clear();

This ignores all exceptions, not only AttributeError.
It also passes live exceptions into the later lookup calls, which may execute arbitrary user code.
msg302103 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-13 18:20
Oh, I see many other issues with C implementation of ElementTree.

1. If XMLParser.__init__ is called twice, it leaks references and the Expat parser.

Possible solution: use the Py_XSETREF() macro instead of simple assignment. The Expat parser needs special handling.

Other possible solution: drop __init__() and make all initialization in __new__(). But this is a solution only for 3.7 because can break compatibility.

2. If XMLParser.__init__ is not called or if it fails to initialize the Expat parser, self->entity and self->target are NULL. Later in xmlparser_getattro() they are increfed unconditionally.
msg302107 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-09-13 19:13
Feel free to provide a separate pull request. These issues seem independent of the exception handling problem that I wrote a fix for.
msg302169 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 12:11
I agree. While these issues are not totally independent (they changes the same code), my issue needs uncommon use of __new__ and __init__, while your issue can be exposed in normal cases.
msg302205 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 20:00
New changeset c8d8e15bfc24abeeaaf3d8be9073276b0c011cdf by Serhiy Storchaka (scoder) in branch 'master':
bpo-31455: Fix an assertion failure in ElementTree.XMLParser(). (#3545)
https://github.com/python/cpython/commit/c8d8e15bfc24abeeaaf3d8be9073276b0c011cdf
msg302224 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-14 23:13
New changeset 49caab46f687eb201898fb6c2c40d47bdcb0e58b by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-31455: Fix an assertion failure in ElementTree.XMLParser(). (GH-3545) (#3585)
https://github.com/python/cpython/commit/49caab46f687eb201898fb6c2c40d47bdcb0e58b
msg304324 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-10-13 08:05
Serhiy, in addition to the problems you mentioned with not calling __init__(), it seems
that calling every method of an uninitialized XMLParser object would crash.

If you don't mind, i would be happy to open an issue to fix these crashes.
msg304338 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-13 13:51
It would be nice. But I see you already have opened issue31758 for reference leaks. I think that other problems can be solved in the same issue.

Do you mind to backport your patch to 2.7 Stefan? If this makes sense. Otherwise I'll just close this issue. Live exceptions don't cause crash in 2.7.
msg304383 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-10-14 06:58
Backport PR for 2.7 added: 3992
msg314358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-24 06:01
New changeset 0694b6a651ba2a53f6323ffb3b23358f43885815 by Serhiy Storchaka (scoder) in branch '2.7':
bpo-31544: Avoid calling "PyObject_GetAttrString()" (and potentially executing user code) with a live exception set. (GH-3992)
https://github.com/python/cpython/commit/0694b6a651ba2a53f6323ffb3b23358f43885815
msg314722 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2018-03-30 21:38
This added a refleak on 2.7, see http://buildbot.python.org/all/#/builders/78/builds/122 and later builds.  Also, the bug ID in the blurb is incorrect (31544 vs 31455).
msg314733 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-03-31 08:28
Right, Zachary, thanks for noticing. Py2.7 is actually way more different than I thought, and I hadn't paid enough attention to that. Py3 does all of this in "__init__", whereas Py2 essentially implements "__new__" in C, which requires more cleanup.

BTW, the implementation in Py3 would also benefit from refactoring the error handling code and moving it all in one place. But it shouldn't suffer from the same kind of problem, at least.
msg314735 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-03-31 10:13
PR 6318 fixes the reference leak for Py2.7.
msg314741 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-31 12:28
New changeset c498cd8bf81fc47acf2f1f702e8b3bc9bd4aed65 by Serhiy Storchaka (scoder) in branch '2.7':
bpo-31544: Fix a reference leak to 'self' after the previous target error handling fixes. (GH-6318)
https://github.com/python/cpython/commit/c498cd8bf81fc47acf2f1f702e8b3bc9bd4aed65
History
Date User Action Args
2022-04-11 14:58:52adminsetgithub: 75636
2018-04-26 06:37:08serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
2018-03-31 12:28:47serhiy.storchakasetmessages: + msg314741
2018-03-31 10:13:59scodersetmessages: + msg314735
2018-03-31 10:07:36scodersetstage: resolved -> patch review
pull_requests: + pull_request6034
2018-03-31 08:28:04scodersetmessages: + msg314733
2018-03-30 21:38:16zach.waresetstatus: closed -> open
nosy: + zach.ware
messages: + msg314722

2018-03-24 06:01:15serhiy.storchakasetmessages: + msg314358
2018-03-24 05:57:41serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-14 06:58:54scodersetmessages: + msg304383
2017-10-14 06:57:38scodersetpull_requests: + pull_request3968
2017-10-13 13:51:06serhiy.storchakasetmessages: + msg304338
2017-10-13 08:05:04Oren Milmansetnosy: + Oren Milman
messages: + msg304324
2017-09-14 23:13:23serhiy.storchakasetmessages: + msg302224
2017-09-14 20:00:15python-devsetpull_requests: + pull_request3576
2017-09-14 20:00:05serhiy.storchakasetmessages: + msg302205
2017-09-14 12:11:56serhiy.storchakasetmessages: + msg302169
2017-09-13 19:13:53scodersetmessages: + msg302107
2017-09-13 18:20:42serhiy.storchakasetmessages: + msg302103
2017-09-13 17:33:04serhiy.storchakasetnosy: + vstinner, eli.bendersky, serhiy.storchaka

versions: + Python 2.7, Python 3.6
2017-09-13 17:29:13scodersetkeywords: + patch
stage: patch review
pull_requests: + pull_request3542
2017-09-13 17:28:39scodercreate