classification
Title: _elementtree.TreeBuilder broken with a non-C-deriving element_factory
Type: crash Stage: resolved
Components: Extension Modules, XML Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, eli.bendersky, ezio.melotti, flox, georg.brandl, larry, mitya57, pitrou, python-dev, skrah
Priority: release blocker Keywords: 3.3regression, patch

Created on 2012-09-30 02:07 by christian.heimes, last changed 2012-11-09 13:51 by mitya57. This issue is now closed.

Files
File name Uploaded Description Edit
issue16089-pseudo-fix.diff skrah, 2012-09-30 11:17
ctree.patch pitrou, 2012-10-01 18:03 review
ctree2.patch pitrou, 2012-10-01 19:09 review
c_treebuilder.patch pitrou, 2012-10-02 19:53 review
c_treebuilder2.patch pitrou, 2012-10-02 20:13 review
Messages (25)
msg171603 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 02:07
The issue was reported by Arfrever on #python-dev. The test suite of simpletal [1] was segfaulting with Python 3.3. It doesn't segfault without the _elementtree C extension.

I'm able to reproduce the issue. It may take a couple of runs with a debug build of Python but eventually Python segfaults. So far segfaults occur inside a GC collection run. Benjamin suspects the issue is related to the new GC code in _elementtree.

    #0  visit_decref (op=<unknown at remote 0x7fea64fc8ca8>, data=0x0) at Modules/gcmodule.c:361
    #1  0x00000000004bccd5 in subtract_refs (containers=<optimized out>) at Modules/gcmodule.c:386
    #2  collect (generation=0, n_collected=0x7fffe3eedb70, n_uncollectable=0x7fffe3eedb78) at Modules/gcmodule.c:891
    #3  0x00000000004bd7e6 in collect_with_callback (generation=0) at Modules/gcmodule.c:1048
    #4  collect_generations () at Modules/gcmodule.c:1071
    #5  _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/gcmodule.c:1580
    #6  0x00000000004bddcc in _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/gcmodule.c:1567
    #7  _PyObject_GC_New (tp=0x8322a0) at Modules/gcmodule.c:1590
    #8  0x0000000000548a3a in new_dict (values=0x0, keys=0xd02e60) at Objects/dictobject.c:395
    #9  _PyDict_NewPresized (minused=<optimized out>) at Objects/dictobject.c:1044
    #10 0x000000000047a449 in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2245
    #11 0x0000000000480b03 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7fffe3eede40, func=
        <function at remote 0x7fea653ff9e0>) at Python/ceval.c:4150


[1] http://www.owlfish.com/software/simpleTAL/downloads/SimpleTAL-5.1.tar.gz
msg171605 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 02:56
The issue could be related to #14007.
msg171619 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-30 08:13
I can't reproduce this, so just a wild guess: Does the segfault still
happen if you replace Py_XDECREF() with Py_CLEAR() in xmlparser_gc_clear()?
msg171625 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 10:59
No, it doesn't make a difference when I replace Py_XDECREF() with Py_CLEAR(). I've also replaced Py_(X)DECREF() in the other *_gc_clear() methods without success.
msg171629 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-30 11:17
I'm now able to reproduce the issue with a non-debug build. As
Christian says, using Py_CLEAR() does not help (though I wonder
if it shouldn't be used anyway).

Reverting part of 20b8f0ee3d64 "fixes" the segfault, see the
attached diff. I don't know the code well enough to say if this
is a valid possibility.
msg171630 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-30 11:45
I can confirm that Stefan's fix works. I ran the SimpleTAL test suite about hundred times in a loop without a segfault. But I don't understand why the change fixes the issue. Could the alteration just lead to another code path so the erroneous code isn't triggered?

+1 for Py_CLEAR()
msg171642 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-09-30 14:39
I'm currently somewhat "offline" for a while (cross-continental move), but I'll do my best to try to recreate my setup to test this problem and the proposed solution.
msg171644 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-30 14:55
Do note that the patch is somewhat cargo-cult. I don't understand yet why
it works; it may very well just silence the real problem.
msg171738 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-01 17:53
Let's make sure this gets into 3.3.1.
msg171739 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 18:03
Here is a collection of assorted small fixes for celementtree. There are probably other problems lurking (the coding style there is quite old).

I cannot say anything about the crasher until there's a simple reproducer :)
msg171742 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 19:09
More assorted celementtree cleanups.
msg171746 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 21:13
By the way, the crash involves an _ElementInterface subclass named SimpleElementTreeVar:

#0  0x0000000000524c0f in visit_decref (op=Traceback (most recent call last):
  File "/home/antoine/cpython/33/python-gdb.py", line 1298, in to_string
    pyop = PyObjectPtr.from_pyobject_ptr(self.gdbval)
  File "/home/antoine/cpython/33/python-gdb.py", line 370, in from_pyobject_ptr
    cls = cls.subclass_from_type(p.type())
  File "/home/antoine/cpython/33/python-gdb.py", line 318, in subclass_from_type
    tp_name = t.field('tp_name').string()
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x95 in position 1: invalid start byte
, data=0x0) at Modules/gcmodule.c:361
#1  0x00000000005bcbc5 in BaseException_traverse (self=0x7f08f390a8b0, visit=0x524b98 <visit_decref>, arg=0x0)
    at Objects/exceptions.c:104
#2  0x00000000004336d9 in subtype_traverse (self=
    <SimpleElementTreeVar(attrib={}, tag='root', ourValue=None, _children=[]) at remote 0x7f08f390a8b0>, 
    visit=0x524b98 <visit_decref>, arg=0x0) at Objects/typeobject.c:837
#3  0x0000000000524cba in subtract_refs (containers=0x8dcf40) at Modules/gcmodule.c:386
#4  0x0000000000525afa in collect (generation=0, n_collected=0x7fffed18be00, n_uncollectable=0x7fffed18bdf8)
    at Modules/gcmodule.c:891
#5  0x00000000005260b2 in collect_with_callback (generation=0) at Modules/gcmodule.c:1048
#6  0x000000000052615c in collect_generations () at Modules/gcmodule.c:1071
#7  0x000000000052707d in _PyObject_GC_Malloc (basicsize=48) at Modules/gcmodule.c:1580
[snip]
msg171748 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 21:36
Ok, the problem is that _elementtree.TreeBuilder expects to receive _elementtree.Element instances, but simpleTAL's element_factory instead gives _ElementInterface instances.

In other words, TreeBuilder is completely broken.
msg171749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 21:39
Example of this is the following code in treebuilder_handle_start:

    if (this != Py_None) {
        if (element_add_subelement((ElementObject*) this, node) < 0)
            goto error;

(note the overly optimistic cast)

but this is really a pervasive problem, since in many places TreeBuilder is hard-wired to use a Element instance and nothing else (despite the element_factory).

Note that simpleTAL cannot use the _elementtree.Element class, since their subclass also inherits from an exception subclass, and the object layouts conflict with each other (yeah, crappy design).
msg171751 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-01 21:40
I'll still commit my cleanup patch in the meantime :-)
msg171752 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-01 21:46
New changeset f9224f23f473 by Antoine Pitrou in branch '3.3':
Sanitize and modernize some of the _elementtree code (see issue #16089).
http://hg.python.org/cpython/rev/f9224f23f473

New changeset 9fb0a8fc5d79 by Antoine Pitrou in branch 'default':
Sanitize and modernize some of the _elementtree code (see issue #16089).
http://hg.python.org/cpython/rev/9fb0a8fc5d79
msg171753 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-10-01 21:57
Thank you, Antoine, for looking into this. I wish I could participate in a meaningful way, but alas it will be days or weeks before I can recreate a suitable setup to get back hacking on Python.
msg171778 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-10-02 09:00
Nice find. -- The Python version does this:

  _Element = _ElementInterface = Element


So (naively) I would think the same should be done for the C version
after importing Element.


But then one runs into the object layouts conflict that you mentioned.

On the other hand, in the original documentation direct use of
_ElementInterface was discouraged:

http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree._ElementInterface-class
msg171829 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-02 19:53
Here is a patch. It still needs some tests.
msg171831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-02 20:13
Updated patch with tests.
msg171981 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-04 17:56
New changeset 7bd9626d8b4f by Antoine Pitrou in branch '3.3':
Issue #16089: Allow ElementTree.TreeBuilder to work again with a non-Element element_factory (fixes a regression in SimpleTAL).
http://hg.python.org/cpython/rev/7bd9626d8b4f

New changeset a8c044188731 by Antoine Pitrou in branch 'default':
Issue #16089: Allow ElementTree.TreeBuilder to work again with a non-Element element_factory (fixes a regression in SimpleTAL).
http://hg.python.org/cpython/rev/a8c044188731
msg172009 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-10-04 22:37
Antoine, Thanks!

You said :

> the coding style there is quite old

It would be great if you could elaborate, however briefly, if there's anything else besides your fixes that is old and should be modernized. I will admit to writing some of that code very recently, but that's mostly because I was trying to follow the existing coding style of the module, which was written a long time ago.
msg172014 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-10-04 22:46
> > the coding style there is quite old
> 
> It would be great if you could elaborate, however briefly, if there's
> anything else besides your fixes that is old and should be modernized.
> I will admit to writing some of that code very recently, but that's
> mostly because I was trying to follow the existing coding style of the
> module, which was written a long time ago.

I don't think that's you. I was talking about some of the cruft I
removed (such as list_join() which was going through complicated hoops
instead of calling PyUnicode_Join(), or manual filling of tuples instead
of calling PyTuple_Pack()).

There are still issues with internal functions stealing references and
such.
msg172019 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-10-04 23:07
Ah, OK. I was actually putting off refactorings in that code to after refactoring the test suite. The latter is in progress, it was paused by my vacation but I do plan to resume it eventually. Once the test suite is sufficiently humane (esp. moves off doctest to enable normal testing of multiple modules with the same set of tests) the refactoring should become easier.
msg175235 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012-11-09 13:49
There are still some false-positive warnings caused by C code that affect docutils (see http://sourceforge.net/tracker/?func=detail&atid=422030&aid=3555164&group_id=38414).

Look at this traceback for exmaple:

http://paste.ubuntu.com/1345164/

There, _ElementInterfaceWrapper is a subclass of etree._ElementInterface (http://repo.or.cz/w/docutils.git/blob/HEAD:/docutils/docutils/writers/odf_odt/__init__.py#l91).
History
Date User Action Args
2012-11-09 13:51:34mitya57setnosy: georg.brandl, pitrou, larry, christian.heimes, ezio.melotti, Arfrever, eli.bendersky, skrah, flox, python-dev, mitya57
2012-11-09 13:50:42mitya57setnosy: georg.brandl, pitrou, larry, christian.heimes, ezio.melotti, Arfrever, eli.bendersky, skrah, flox, python-dev, mitya57
2012-11-09 13:49:52mitya57setnosy: + mitya57, larry
messages: + msg175235
2012-10-04 23:07:07eli.benderskysetmessages: + msg172019
2012-10-04 22:46:59pitrousetmessages: + msg172014
2012-10-04 22:37:30eli.benderskysetmessages: + msg172009
2012-10-04 17:59:30pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2012-10-04 17:56:16python-devsetmessages: + msg171981
2012-10-02 20:13:51pitrousetfiles: + c_treebuilder2.patch

messages: + msg171831
stage: needs patch -> patch review
2012-10-02 19:53:31pitrousetfiles: + c_treebuilder.patch

messages: + msg171829
title: _elementtree causes segfault in GC -> _elementtree.TreeBuilder broken with a non-C-deriving element_factory
2012-10-02 09:00:58skrahsetmessages: + msg171778
2012-10-01 21:57:58eli.benderskysetmessages: + msg171753
2012-10-01 21:46:40python-devsetnosy: + python-dev
messages: + msg171752
2012-10-01 21:40:32pitrousetmessages: + msg171751
2012-10-01 21:39:24pitrousetmessages: + msg171749
2012-10-01 21:36:24pitrousetmessages: + msg171748
2012-10-01 21:13:42pitrousetmessages: + msg171746
2012-10-01 19:09:01pitrousetfiles: + ctree2.patch

messages: + msg171742
2012-10-01 18:03:17pitrousetfiles: + ctree.patch
nosy: + pitrou
messages: + msg171739

2012-10-01 17:53:02georg.brandlsetpriority: critical -> release blocker

messages: + msg171738
2012-09-30 16:47:27ezio.melottisetnosy: + ezio.melotti
2012-09-30 14:55:51skrahsetmessages: + msg171644
2012-09-30 14:39:05eli.benderskysetmessages: + msg171642
2012-09-30 11:45:28christian.heimessetmessages: + msg171630
2012-09-30 11:35:00georg.brandlsetnosy: + georg.brandl
2012-09-30 11:17:19skrahsetfiles: + issue16089-pseudo-fix.diff
keywords: + patch
messages: + msg171629
2012-09-30 10:59:03christian.heimessetmessages: + msg171625
2012-09-30 08:13:31skrahsetnosy: + skrah
messages: + msg171619
2012-09-30 08:02:40floxsetnosy: + eli.bendersky, flox
components: + XML
2012-09-30 02:56:01christian.heimessetmessages: + msg171605
2012-09-30 02:12:49Arfreversetnosy: + Arfrever
2012-09-30 02:07:42christian.heimescreate