This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash on double list(lxml.etree.iterparse(f))
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename
View: 37467
Assigned To: Nosy List: kuraga, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2019-09-05 20:50 by kuraga, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bt.txt kuraga, 2019-09-05 20:50 GDB's backtrace
Messages (7)
msg351222 - (view) Author: Alexander Kurakin (kuraga) Date: 2019-09-05 20:50
I have a crash in Python 3.x environments and don't have in 2.7.

===

Crashes in:

Python 3.7.4, Anaconda, Linux 64bit, lxml 4.3.3
OR
Python 3.7.4, Anaconda, Linux 64bit, lxml 4.4.1
OR
Python 3.6.5, Gentoo Linux 64bit, lxml 4.3.3

test.py:

import lxml
import lxml.etree

with open('test.xml', 'rb') as f:
    list(
        lxml.etree.iterparse(f)
    )

    # Traceback (most recent call last):
    # File "test.py", line 18, in <module>
    # lxml.etree.iterparse(f)
    # File "src/lxml/iterparse.pxi", line 209, in lxml.etree.iterparse.__next__
    # File "src/lxml/iterparse.pxi", line 194, in lxml.etree.iterparse.__next__
    # File "src/lxml/iterparse.pxi", line 225, in lxml.etree.iterparse._read_more_events
    # File "src/lxml/parser.pxi", line 1380, in lxml.etree._FeedParser.close
    # Segmentation fault
    list(
        lxml.etree.iterparse(f)
    )

test.xml:

<?xml version="1.0" encoding="UTF-8"?>
<root></root>

(or any else)

GDB's backtrace see in attachment.

===

Doesn't crash in:

Python 2.7.15, Gentoo Linux 64bit, lxml 4.3.3

# Traceback (most recent call last):
#   File "test.py", line 19, in <module>
#     lxml.etree.iterparse(f)
#   File "src/lxml/iterparse.pxi", line 209, in lxml.etree.iterparse.__next__
#    File "/home/sasha/_lxml-bug/test.xml", line 0
lxml.etree.XMLSyntaxError: no element found

===

See also: https://bugs.launchpad.net/lxml/+bug/1833050
msg351233 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-06 07:56
This tracker is for bugs in CPython (the C implementation of Python). lxml is not a part of the standard library. Please use a proper tracker for reporting a bug in lxml.
msg351234 - (view) Author: Alexander Kurakin (kuraga) Date: 2019-09-06 09:03
Yes, I do. Please read all.

1) According to https://bugs.launchpad.net/lxml/+bug/1833050 (by lxml author's opinion) it's not a lxml bug.

2) I wouldn't said that but according to backtrace we have crash at Python-time.

3) Moreover crash existence depends on Python version!

Thanks.
msg351239 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-06 09:23
Indeed, there is a bug in Python. It can be reproduced without lxml:

$ ./python -c "raise SyntaxError('error', (b'file', 1, 2, 'text'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
python: Objects/unicodeobject.c:397: _PyUnicode_CheckConsistency: Assertion `PyUnicode_Check(op)' failed.
Aborted (core dumped)

It has been fixed in 3.7+ by issue37467, but the fix was not backported to 3.6.
msg351241 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-06 09:38
I understand that this bug is a duplicate of bpo-37467 that I fixed recently in 3.7, 3.8 and master branches:

commit f9b7457bd7f438263e0d2dd1f70589ad56a2585e
Author: Victor Stinner <vstinner@redhat.com>
Date:   Mon Jul 1 16:51:18 2019 +0200

    bpo-37467: Fix PyErr_Display() for bytes filename (GH-14504)


> It has been fixed in 3.7+ by issue37467, but the fix was not backported to 3.6.

Python 3.6 no longer accept bug fixes, only security fixes:
https://devguide.python.org/#status-of-python-branches

And this bug doesn't look like a security issue.

Python 2.7 doesn't seem to be affected by this issue.

In short, nothing can be one on the Python side: please upgrade to Python 3.7.

I close this issue as a duplicate of bpo-37467.
msg351242 - (view) Author: Alexander Kurakin (kuraga) Date: 2019-09-06 09:43
Ok, thanks!

Good that I was too lazy to open bug in June because our colleague did it better then me :)

Seems like the fix doesn't present in current 3.7 version (3.7.4). Will try in 3.7.5 and will close

UPD: Oh, closed? Ok. But *no*, it's not fixed in current 3.7 version (in master - yes). I wrote: 3.7.4. Let's check each other instead of... Sorry. Thanks.
msg351245 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-06 13:07
https://github.com/python/cpython/commit/8cbffc4d96d1da0fbc38da6f34f2da30c5ffd601 has been merged after 3.7.4 has been released, correct. I will be part of the next 3.7.5 release.

About the workflow: we close a bug once the change is merged into the development branch, not when a release is published.

For the current 3.7.5 schedule, see:
https://www.python.org/dev/peps/pep-0537/#id4

In the meanwhile, avoid calling lxml.etree.iterparse(f) twice :-)
History
Date User Action Args
2022-04-11 14:59:19adminsetgithub: 82223
2019-09-06 13:07:54vstinnersetmessages: + msg351245
2019-09-06 09:43:43kuragasetmessages: + msg351242
versions: + Python 3.7
2019-09-06 09:38:28vstinnersetstatus: open -> closed
superseder: sys.excepthook (PyErr_Display) does crash with SyntaxError which has a bytes filename
messages: + msg351241

resolution: duplicate
stage: resolved
2019-09-06 09:23:10serhiy.storchakasetstatus: closed -> open

components: + Interpreter Core
versions: - Python 3.7
nosy: + vstinner

messages: + msg351239
resolution: third party -> (no value)
stage: resolved -> (no value)
2019-09-06 09:03:15kuragasetmessages: + msg351234
2019-09-06 07:56:49serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg351233

resolution: third party
stage: resolved
2019-09-05 20:50:19kuragasettitle: Crash on double list(lxml.etree.iterparse(f)) Edit -> Crash on double list(lxml.etree.iterparse(f))
2019-09-05 20:50:01kuragacreate