This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: elementtree segfaults on invalid xml declaration
Type: crash Stage:
Components: Library (Lib) Versions: Python 2.4, Python 3.1, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jyasskin Nosy List: amaury.forgeotdarc, barry, chuck, doerwalter, effbot, ezio.melotti, fdrake, jyasskin, pitrou, schmir, whichlinden
Priority: deferred blocker Keywords:

Created on 2009-10-15 05:37 by whichlinden, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (11)
msg94073 - (view) Author: Ryan Williams (whichlinden) Date: 2009-10-15 05:37
This crash is surprisingly consistent across versions, operating
systems, and whether the c module is used or not:

Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.cElementTree import fromstring
>>> fromstring('<?xml \xcb\x8c ?>')
Segmentation fault

Python 2.5.4 (r254:67916, Jun  3 2009, 14:22:10) 
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.ElementTree import fromstring
>>> fromstring('<?xml \xcb\x8c ?>')
Segmentation fault

Python 2.4.4 (#2, Oct 22 2008, 20:20:22) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from elementtree.ElementTree import fromstring
>>> fromstring('<?xml \xcb\x8c ?>')
Segmentation fault

Python 2.5 (release25-maint, Jul 23 2008, 18:15:29) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.ElementTree import fromstring
>>> fromstring('<?xml \xcb\x8c ?>')
Segmentation fault

Python 2.5.2 (r252:60911, Jan  4 2009, 17:40:26) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.ElementTree import fromstring
>>> fromstring('<?xml \xcb\x8c ?>')
Segmentation fault

I'm a little fuzzy on who's responsible for elementtree, so if there's a
more appropriate venue to file this bug, please let me know.
msg94075 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-10-15 06:26
Confirmed on 3.1 on Windows too.
msg94076 - (view) Author: Jan (chuck) * Date: 2009-10-15 06:36
I'm seeing this on the built-in python on os x 10.6, too:
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin


But neither with the trunk
Python 2.7a0 (trunk:75433M, Oct 15 2009, 08:27:13) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin

nor with a ports installation
Python 2.6.3 (r263:75183, Oct  7 2009, 07:05:03) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
msg94080 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-10-15 09:58
Here is a stacktrace of the crash with the system Python 2.6.1 on Mac OS
X 10.6.1:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000101000000
0x00007fff810f96b8 in XML_SetEncoding ()
(gdb) bt
#0  0x00007fff810f96b8 in XML_SetEncoding ()
#1  0x00007fff810ecad0 in XML_GetCurrentLineNumber ()
#2  0x00000001005c2150 in initpyexpat ()
#3  0x00000001005c3516 in initpyexpat ()
#4  0x00000001000891df in PyEval_EvalFrameEx ()
#5  0x0000000100089330 in PyEval_EvalFrameEx ()
#6  0x0000000100089330 in PyEval_EvalFrameEx ()
#7  0x000000010008accf in PyEval_EvalCodeEx ()
#8  0x000000010008ad62 in PyEval_EvalCode ()
#9  0x00000001000a265a in Py_CompileString ()
#10 0x00000001000a44dd in PyRun_InteractiveOneFlags ()
#11 0x00000001000a4615 in PyRun_InteractiveLoopFlags ()
#12 0x00000001000a4685 in PyRun_AnyFileExFlags ()
#13 0x00000001000b0286 in Py_Main ()
#14 0x0000000100000e6c in ?? ()
msg94084 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-10-15 11:46
This has already been fixed with r74429, but no issue was filed at the time.

It should be backported to 2.6 and 3.1 at least.
And probably to 2.5 as well, because a crash on XML input can be
considered as a security issue.

Raising to "deferred blocker" so that it does not block 2.6.4.
msg94086 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-15 12:04
Is our copy of expat in sync with upstream? How does maintenance happen?
msg94087 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-10-15 12:15
The same thing had been discovered in expat 12 months before:
http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmltok_impl.c?r1=1.13&r2=1.15
But expat hasn't made any release since 2.0.1, in June 2007...

Are you suggesting to update our copy of expat with its latest cvs revision?
msg94088 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-15 12:17
I don't know really. I wonder how Linux distributions handle maintenance
of that library.
Perhaps Fred Drake can help us?
msg94089 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-10-15 12:41
FWIW, the latest Debian package: http://packages.debian.org/sid/libexpat1
is also vulnerable (I checked in the sources expat_2.0.1.orig.tar.gz,
and it's not corrected in expat_2.0.1-4.diff.tgz)
msg94111 - (view) Author: Ryan Williams (whichlinden) Date: 2009-10-15 23:40
Adding 2.5 back, looks like it was removed accidentally.

Also, here's a list of strings for testing purposes: 

['<?xml \xee\xae\x94 ?>', '<?xml \xc4\x9d ?>', '<?xml \xc8\x84 ?>',
'<?xml \xd9\xb5 ?>', '<?xml \xd9\xaa ?>', '<?xml \xc9\x88 ?>', '<?xml
\xcb\x8c ?>']
msg98137 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-01-22 08:42
This was merged into 2.6 with r74432, into 2.5 with r77666, into 3.1 with r74436.
History
Date User Action Args
2022-04-11 14:56:54adminsetgithub: 51387
2010-01-22 08:42:48amaury.forgeotdarcsetstatus: open -> closed

nosy: + jyasskin
messages: + msg98137

assignee: jyasskin
resolution: fixed
2009-10-15 23:40:48whichlindensetmessages: + msg94111
versions: + Python 2.5
2009-10-15 12:41:05amaury.forgeotdarcsetmessages: + msg94089
2009-10-15 12:17:36pitrousetassignee: effbot -> (no value)

messages: + msg94088
nosy: + fdrake
2009-10-15 12:15:32amaury.forgeotdarcsetmessages: + msg94087
2009-10-15 12:11:57schmirsetnosy: + schmir
2009-10-15 12:04:22pitrousetnosy: + pitrou
messages: + msg94086
2009-10-15 11:46:23amaury.forgeotdarcsetpriority: high -> deferred blocker
nosy: + amaury.forgeotdarc, barry
messages: + msg94084

2009-10-15 09:58:22doerwaltersetnosy: + doerwalter
messages: + msg94080
2009-10-15 06:36:32chucksetnosy: + chuck
messages: + msg94076
2009-10-15 06:26:57ezio.melottisetpriority: high

nosy: + effbot, ezio.melotti
versions: + Python 3.1, - Python 2.5
messages: + msg94075

assignee: effbot
2009-10-15 05:37:30whichlindencreate