Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_elementtree.c calls Python callbacks while a Python exception is set #62701

Closed
vstinner opened this issue Jul 18, 2013 · 5 comments
Closed

_elementtree.c calls Python callbacks while a Python exception is set #62701

vstinner opened this issue Jul 18, 2013 · 5 comments

Comments

@vstinner
Copy link
Member

BPO 18501
Nosy @freddrake, @vstinner, @tiran

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-08-13.23:59:23.192>
created_at = <Date 2013-07-18.21:01:01.524>
labels = []
title = '_elementtree.c calls Python callbacks while a Python exception is set'
updated_at = <Date 2013-08-13.23:59:23.191>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2013-08-13.23:59:23.191>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2013-08-13.23:59:23.192>
closer = 'vstinner'
components = []
creation = <Date 2013-07-18.21:01:01.524>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 18501
keywords = []
message_count = 5.0
messages = ['193325', '193326', '193328', '193339', '195111']
nosy_count = 4.0
nosy_names = ['fdrake', 'vstinner', 'christian.heimes', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue18501'
versions = ['Python 3.4']

@vstinner
Copy link
Member Author

The ElementTree module allows to write a XML parser using Python callbacks. The module relies on the expat library which is implemented in C. Expat calls these Python callbacks, but ElementTree does not check if a Python exception was raised or not.

Example 1:
-------------------

import unittest
from xml.etree import ElementTree as ET

class Target(object):
    def start(self, tag, attrib):
        print("start")
        raise ValueError("raise start")

    def end(self, tag):
        print("end")
        raise ValueError("raise end")

    def close(self):
        print("close")
        raise ValueError("raise close")

parser = ET.XMLParser(target=Target())
parser.feed("<root><test /></root>")

Output with Python 3.3:
-------------------

start
startendendTraceback (most recent call last):
  File "x.py", line 18, in <module>
    parser.feed("<root><test /></root>")
  File "x.py", line 10, in end
    print("end")
  File "x.py", line 10, in end
    print("end")
  File "x.py", line 6, in start
    print("start")
  File "x.py", line 7, in start
    raise ValueError("raise start")
ValueError: raise start

start() was called twice, as end() method, even if the first start() method raised an exception.

The traceback is strange: it looks like end() was called by start(), which is wrong.

Example 2:
-------------------

import unittest
from xml.etree import ElementTree as ET

class Target(object):
    def start(self, tag, attrib):
        raise ValueError("raise start")

    def end(self, tag):
        raise ValueError("raise end")

    def close(self):
        raise ValueError("raise close")

parser = ET.XMLParser(target=Target())
parser.feed("<root><test /></root>")

Output with Python 3.3:
-------------------

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    parser.feed("<root><test /></root>")
  File "x.py", line 9, in end
    raise ValueError("raise end")
ValueError: raise end

end() was called even if start() already failed. The exception which was set by start has been replaced by end() exception.

In my opinion, it's not a good thing to call PyEval_EvalFrameEx() and similar functions when a Python exception is set, because it behaves badly (ex: print("end") in Example 1 raises an exception... which is wrong, the traceback is also corrupted) and may replaces the old exception with a new exception (ex: "end" replaces "started").

@vstinner
Copy link
Member Author

For the issue bpo-18408, I added assertions in PyEval_EvalFrameEx() and similar functions to exit with an assertion error in debug mode if these functions are called with an exception set:

New changeset 48a869a39e2d by Victor Stinner in branch 'default':
Issue bpo-18408: PyEval_EvalFrameEx() and PyEval_CallObjectWithKeywords() now fail
http://hg.python.org/cpython/rev/48a869a39e2d

New changeset 5bd9db528aed by Victor Stinner in branch 'default':
Issue bpo-18408: PyObject_Str(), PyObject_Repr() and type_call() now fail with an
http://hg.python.org/cpython/rev/5bd9db528aed

lxml test suite failed with an C assertion error because of these changes. I fixed the issue with the following change:

New changeset 6ec0e9347dd4 by Victor Stinner in branch 'default':
Issue bpo-18408: Fix _elementtree.c, don't call Python function from an expat
http://hg.python.org/cpython/rev/6ec0e9347dd4

Instead of having to check if an exception is set in each Python callback, it would be better to "stop" the XML parser. Modules/pyexpat.c calls "flag_error(self); XML_SetCharacterDataHandler(self->itself, noop_character_data_handler);" on error:

/* This handler is used when an error has been detected, in the hope
   that actual parsing can be terminated early.  This will only help
   if an external entity reference is encountered. */
static int
error_external_entity_ref_handler(XML_Parser parser,
                                  const XML_Char *context,
                                  const XML_Char *base,
                                  const XML_Char *systemId,
                                  const XML_Char *publicId)
{
    return 0;
}

/* Dummy character data handler used when an error (exception) has
   been detected, and the actual parsing can be terminated early.
   This is needed since character data handler can't be safely removed
   from within the character data handler, but can be replaced.  It is
   used only from the character data handler trampoline, and must be
   used right after `flag_error()` is called. */
static void
noop_character_data_handler(void *userData, const XML_Char *data, int len)
{
    /* Do nothing. */
}

static void
flag_error(xmlparseobject *self)
{
    clear_handlers(self, 0);
    XML_SetExternalEntityRefHandler(self->itself,
                                    error_external_entity_ref_handler);
}

@python-dev
Copy link
Mannequin

python-dev mannequin commented Jul 18, 2013

New changeset 5a6cdc0d7de1 by Victor Stinner in branch 'default':
Issue bpo-18501, bpo-18408: Fix expat handlers in pyexpat, don't call Python
http://hg.python.org/cpython/rev/5a6cdc0d7de1

@vstinner
Copy link
Member Author

See also the issue bpo-18488, similar issue in sqlite.

@vstinner
Copy link
Member Author

Instead of having to check if an exception is set in each Python
callback, it would be better to "stop" the XML parser.

I created the issue bpo-18733 to track this optimization.

The initial issue is fixed, so I'm closing it.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant