classification
Title: cElementTree calls end() on parser taget even if start() fails
Type: behavior Stage: needs patch
Components: Library (Lib), XML Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eli.bendersky, ezio.melotti, scoder
Priority: normal Keywords:

Created on 2013-01-24 15:10 by scoder, last changed 2013-05-20 12:56 by eli.bendersky.

Messages (2)
msg180526 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-01-24 15:10
The following compatibility unit test fails for me in lxml since Py3.3.

    etree = xml.etree.ElementTree

    def test_parser_target_error_in_start(self):
        assertEqual = self.assertEqual

        events = []
        class Target(object):
            def start(self, tag, attrib):
                events.append("start")
                assertEqual("TAG", tag)
                raise ValueError("TEST")
            def end(self, tag):
                events.append("end")
                assertEqual("TAG", tag)
            def close(self):
                return "DONE"

        parser = self.etree.XMLParser(target=Target())

        try:
            parser.feed("<TAG/>")
        except ValueError:
            self.assertTrue('TEST' in str(sys.exc_info()[1]))
        else:
            self.assertTrue(False)

        # ERROR HERE - gives ["start", "end"] in Py3.3
        self.assertEqual(["start"], events)

It seems like cET doesn't handle exceptions early enough and still calls the end() method. Neither Python ElementTree nor lxml do this.

Some more tests are here:

https://github.com/lxml/lxml/blob/master/src/lxml/tests/test_elementtree.py#L3446

(all tests in that file are known to work with ET)
msg189655 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-05-20 12:56
Yes, it doesn't seem that expat cares too much about propagating errors from every single handler. Digging in its code comments, it says that even when XML_StopParser is called, some event handlers (like the one for "end element") may still be called since otherwise they will be lost.

I don't know if this is important enough to muck with the way expat does things internally - I would expect this problem to exist in all Python XML modules that use expat.
History
Date User Action Args
2013-05-20 12:56:25eli.benderskysetmessages: + msg189655
2013-01-24 15:12:09ezio.melottisetnosy: + ezio.melotti

stage: needs patch
2013-01-24 15:10:08scodercreate