classification
Title: pyexpat segmentation fault caused by multiple calls to Parse()
Type: crash Stage:
Components: Library (Lib), macOS Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: expat parser throws Memory Error when parsing multiple files
View: 6676
Assigned To: ronaldoussoren Nosy List: barry, dhgutteridge, ned.deily, ronaldoussoren, terry.reedy
Priority: normal Keywords:

Created on 2011-08-23 22:45 by dhgutteridge, last changed 2011-09-01 05:01 by dhgutteridge. This issue is now closed.

Files
File name Uploaded Description Edit
pyexpat_crash_isolation_osx.py dhgutteridge, 2011-08-23 22:44
pyexpat_crash_isolation_nb.py dhgutteridge, 2011-08-30 04:58
Messages (12)
msg142868 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-23 22:44
I stumbled across this bug because of a misunderstanding I had about how the pyexpat module works.  I'd inferred that a given instance could be reused to parse multiple files, which is apparently not supported.  (There's already a documentation bug open on this, see http://bugs.python.org/issue6676 -- a few other people made the same mistaken assumption as me.)  I found that given the right input, a segmentation fault occurs when one attempts to reuse the parser instance on more than one file.

The sample test case I've attached derives from what I'm using pyexpat for, which involves the parsing of Microsoft Office Open XML Excel files.  I found that the specific content in the initial file can influence whether the submission of subsequent files triggers a segmentation fault.

I'm reporting this against Python 2.7.2 on Mac OS X 10.6.8; it also occurs with Python 2.6.1 that's bundled with the OS.  I can also duplicate it on the development branch of NetBSD (my other development platform), specifically 5.99.47/amd64 with Python 2.6.7.
msg142869 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-23 23:10
I believe this may be an OS-specific bug somehow, albeit one that affects multiple OSes.  I cannot duplicate the crash on NetBSD 5.1_STABLE/i386 with Python 2.6.7, or on OpenSuSE 11.3 with Python 2.6.5.  (It's interesting that it doesn't crash on the older branch of NetBSD, but it does on the newer, both with the same version of Python and underlying Expat...)
msg142870 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-23 23:15
Here's the (non-debug) trace under OS X:

Process:         Python [4604]
Path:            /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  bash [1461]

Date/Time:       2011-08-23 19:14:48.148 -0400
OS Version:      Mac OS X 10.6.8 (10K549)
Report Version:  6

Interval Since Last Report:          366485 sec
Crashes Since Last Report:           29
Per-App Crashes Since Last Report:   29
Anonymous UUID:                      5504B203-8C24-427A-B74C-EDBD3EF8DB51

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000100569000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   pyexpat.so                    	0x000000010050e439 normal_updatePosition + 57
1   pyexpat.so                    	0x00000001004f9314 PyExpat_XML_GetCurrentLineNumber + 84
2   pyexpat.so                    	0x00000001004f374e set_error + 62
3   pyexpat.so                    	0x00000001004f4588 xmlparse_Parse + 200
4   org.python.python             	0x00000001000c102d PyEval_EvalFrameEx + 22397
5   org.python.python             	0x00000001000c2d29 PyEval_EvalCodeEx + 2137
6   org.python.python             	0x00000001000c0b6a PyEval_EvalFrameEx + 21178
7   org.python.python             	0x00000001000c2d29 PyEval_EvalCodeEx + 2137
8   org.python.python             	0x00000001000c2e46 PyEval_EvalCode + 54
9   org.python.python             	0x00000001000e7b6e PyRun_FileExFlags + 174
10  org.python.python             	0x00000001000e7e29 PyRun_SimpleFileExFlags + 489
11  org.python.python             	0x00000001000fe77c Py_Main + 2940
12  org.python.python             	0x0000000100000f14 0x100000000 + 3860

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x00000000fffffffb  rbx: 0x000000010037bbc0  rcx: 0x000000010037bec8  rdx: 0x00000001008cd39f
  rdi: 0x00000001005256c0  rsi: 0x0000000100569000  rbp: 0x00007fff5fbfedf0  rsp: 0x00007fff5fbfedf0
   r8: 0x000000010050e458   r9: 0x00000001008caa00  r10: 0x0000000000000800  r11: 0x0000000100542dda
  r12: 0x0000000000000000  r13: 0x00000001003037e0  r14: 0x0000000000000009  r15: 0x00000001004aca70
  rip: 0x000000010050e439  rfl: 0x0000000000010293  cr2: 0x0000000100569000

Binary Images:
       0x100000000 -        0x100000fff +org.python.python 2.7.2 (2.7.2) <639E72E4-F205-C034-8E34-E59DE9C46369> /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
       0x100003000 -        0x10016cfef +org.python.python 2.7.2, (c) 2004-2011 Python Software Foundation. (2.7.2) <49D18B1A-C92D-E32E-A7C1-086D0B14BD76> /Library/Frameworks/Python.framework/Versions/2.7/Python
       0x1002ec000 -        0x1002efff7 +strop.so ??? (???) <F7857283-F427-7CF7-9B0D-7619AA0A82F1> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
       0x1004f0000 -        0x100524fe7 +pyexpat.so ??? (???) <E5FD4237-8D59-8B8E-E229-19601A03F18E> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/pyexpat.so
    0x7fff5fc00000 -     0x7fff5fc3bdef  dyld 132.1 (???) <B536F2F1-9DF1-3B6C-1C2C-9075EA219A06> /usr/lib/dyld
    0x7fff8005d000 -     0x7fff801d4fe7  com.apple.CoreFoundation 6.6.5 (550.43) <31A1C118-AD96-0A11-8BDF-BD55B9940EDC> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
    0x7fff822f0000 -     0x7fff824b1fef  libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
    0x7fff82781000 -     0x7fff82792ff7  libz.1.dylib 1.2.3 (compatibility 1.0.0) <FB5EE53A-0534-0FFA-B2ED-486609433717> /usr/lib/libz.1.dylib
    0x7fff8376d000 -     0x7fff837eafef  libstdc++.6.dylib 7.9.0 (compatibility 7.0.0) <35ECA411-2C08-FD7D-11B1-1B7A04921A5C> /usr/lib/libstdc++.6.dylib
    0x7fff85577000 -     0x7fff8557bff7  libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib
    0x7fff86259000 -     0x7fff86417fff  libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <4274FC73-A257-3A56-4293-5968F3428854> /usr/lib/libicucore.A.dylib
    0x7fff86526000 -     0x7fff865dcff7  libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <03140531-3B2D-1EBA-DA7F-E12CC8F63969> /usr/lib/libobjc.A.dylib
    0x7fff8739a000 -     0x7fff873e6fff  libauto.dylib ??? (???) <F7221B46-DC4F-3153-CE61-7F52C8C293CF> /usr/lib/libauto.dylib
    0x7fffffe00000 -     0x7fffffe01fff  libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
msg143115 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-08-28 18:27
A note for anyone else: David is actually using the xml.parsers.expat module, which uses the now undocumented pyexpat module, whose direct use is deprecated.

David: Have you tested with 3.1 or 3.2? (I am about to try on Windows ;-).
msg143116 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-08-28 18:32
Running with IDLE on Windows, I get no crash or uncaught exception but got these printed lines:

An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element
Line number: 1
An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element
Line number: 1
An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element
An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element
Line number: 1
An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element
Line number: 1
An error occurred during XML parsing.  Error ID: 9.  Error message: junk after document element

Is this the correct, expected output?
msg143197 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-30 04:22
Terry: I wasn't aware xml.parsers.expat is deprecated, though it clearly says so in the documentation, I now see...  (I'd been using it because it features prominently in various examples in Python books, and it's lightweight.)  I haven't tested with the 3.x series, because I rely on the 2.6 branch as a dependency for a variety of software on NetBSD, but having said that, I can test it on Mac OS X.  Your test output is the correct, expected results, yes.
msg143198 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-30 04:37
Confirming that Python 3.2.1 crashes the same way on Mac OS X 10.6.8:

Process:         Python [9594]
Path:            /Library/Frameworks/Python.framework/Versions/3.2/Resources/Python.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  bash [9570]

Date/Time:       2011-08-30 00:35:53.863 -0400
OS Version:      Mac OS X 10.6.8 (10K549)
Report Version:  6

Interval Since Last Report:          292720 sec
Crashes Since Last Report:           2
Per-App Crashes Since Last Report:   2
Anonymous UUID:                      5504B203-8C24-427A-B74C-EDBD3EF8DB51

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000001006fb000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   pyexpat.so                    	0x00000001006a03e9 normal_updatePosition + 57
1   pyexpat.so                    	0x000000010068b2c4 PyExpat_XML_GetCurrentLineNumber + 84
2   pyexpat.so                    	0x000000010068673e set_error + 62
3   pyexpat.so                    	0x00000001006874e8 xmlparse_Parse + 200
4   org.python.python             	0x00000001000b39b2 PyEval_EvalFrameEx + 30530
5   org.python.python             	0x00000001000b2a4d PyEval_EvalFrameEx + 26589
6   org.python.python             	0x00000001000b431a PyEval_EvalCodeEx + 1770
7   org.python.python             	0x00000001000b462f PyEval_EvalCode + 63
8   org.python.python             	0x00000001000db82b PyRun_FileExFlags + 187
9   org.python.python             	0x00000001000dbaf9 PyRun_SimpleFileExFlags + 521
10  org.python.python             	0x00000001000f0a03 Py_Main + 3059
11  org.python.python             	0x0000000100000e5f 0x100000000 + 3679
12  org.python.python             	0x0000000100000d04 0x100000000 + 3332

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x00000000fffffffb  rbx: 0x00000001003a9b40  rcx: 0x00000001003a9e48  rdx: 0x000000010093b59f
  rdi: 0x00000001006b76e0  rsi: 0x00000001006fb000  rbp: 0x00007fff5fbfed60  rsp: 0x00007fff5fbfed60
   r8: 0x00000001006a0408   r9: 0x00000001008cb400  r10: 0x0000000000000800  r11: 0x00000001006d4dda
  r12: 0x0000000000000000  r13: 0x00000001005aa5f0  r14: 0x0000000000000009  r15: 0x00000001002b6810
  rip: 0x00000001006a03e9  rfl: 0x0000000000010293  cr2: 0x00000001006fb000

Binary Images:
       0x100000000 -        0x100000ff7 +org.python.python 3.2.1 (3.2.1) <B2AFB510-C20A-61C8-C375-448C252C66A8> /Library/Frameworks/Python.framework/Versions/3.2/Resources/Python.app/Contents/MacOS/Python
       0x100003000 -        0x100182ff7 +org.python.python 3.2.1, (c) 2004-2011 Python Software Foundation. (3.2.1) <9A9D8FC9-0EA2-8B57-D918-373F60ECF77A> /Library/Frameworks/Python.framework/Versions/3.2/Python
       0x1002fc000 -        0x1002fcfff +_bisect.so ??? (???) <25A7A434-1970-9B41-5BFD-31B6F7AD6ECF> /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/lib-dynload/_bisect.so
       0x1005b0000 -        0x1005b1ff7 +_heapq.so ??? (???) <3E54D664-5279-8504-CA26-E23A15CF152D> /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/lib-dynload/_heapq.so
       0x100682000 -        0x1006b6fef +pyexpat.so ??? (???) <F5A9710C-3B05-3BA8-66E1-5D34290441CA> /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/lib-dynload/pyexpat.so
    0x7fff5fc00000 -     0x7fff5fc3bdef  dyld 132.1 (???) <B536F2F1-9DF1-3B6C-1C2C-9075EA219A06> /usr/lib/dyld
    0x7fff8005d000 -     0x7fff801d4fe7  com.apple.CoreFoundation 6.6.5 (550.43) <31A1C118-AD96-0A11-8BDF-BD55B9940EDC> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
    0x7fff822f0000 -     0x7fff824b1fef  libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
    0x7fff82781000 -     0x7fff82792ff7  libz.1.dylib 1.2.3 (compatibility 1.0.0) <FB5EE53A-0534-0FFA-B2ED-486609433717> /usr/lib/libz.1.dylib
    0x7fff8376d000 -     0x7fff837eafef  libstdc++.6.dylib 7.9.0 (compatibility 7.0.0) <35ECA411-2C08-FD7D-11B1-1B7A04921A5C> /usr/lib/libstdc++.6.dylib
    0x7fff85577000 -     0x7fff8557bff7  libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib
    0x7fff86259000 -     0x7fff86417fff  libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <4274FC73-A257-3A56-4293-5968F3428854> /usr/lib/libicucore.A.dylib
    0x7fff86526000 -     0x7fff865dcff7  libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <03140531-3B2D-1EBA-DA7F-E12CC8F63969> /usr/lib/libobjc.A.dylib
    0x7fff8739a000 -     0x7fff873e6fff  libauto.dylib ??? (???) <F7221B46-DC4F-3153-CE61-7F52C8C293CF> /usr/lib/libauto.dylib
    0x7fffffe00000 -     0x7fffffe01fff  libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
msg143200 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-08-30 04:58
Further details:

- The original test case I'd submitted crashed on the development branch of NetBSD as well as Mac OS X Snow Leopard, but not the most recent stable branch of NetBSD.  I've found a separate test case that crashes on both branches of NetBSD, but not OS X...  This is quite possibly a separate bug, but the means of triggering it is directly related, so I'm including it here.

- I also built Python 2.7.2 under Solaris to see if either test case resulted in a crash there, and they do not, so it seems this is BSDish somehow (or else, the Mac OS X and NetBSD crashes are two separate bugs).

- With NetBSD, I also created tests in C that use the Expat library directly, submitting the very same test data, and they do not crash, they return the expected results, so it appears there's definitely something happening in Python somewhere that's causing this.

This is the (non-debug) crash trace from the separate NetBSD test.  (I will look at building a debug version of Python when I get a chance...)  I'm running Python 2.6.7 on the NetBSD machines.

#0  0xbb93ff64 in XML_ParserCreate () from /usr/X11R7/lib/libexpat.so.1
#1  0xbb9348a3 in XML_GetCurrentLineNumber () from /usr/X11R7/lib/libexpat.so.1
#2  0xbb956743 in set_error () from /usr/pkg/lib/python2.6/site-packages/pyexpat.so
#3  0xbb956d21 in xmlparse_Parse () from /usr/pkg/lib/python2.6/site-packages/pyexpat.so
#4  0xbbb048b0 in PyCFunction_Call () from /usr/pkg/lib/libpython2.6.so.1.0
#5  0xbbb5a3d7 in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.6.so.1.0
#6  0xbbb5add8 in PyEval_EvalCodeEx () from /usr/pkg/lib/libpython2.6.so.1.0
#7  0xbbb5914e in PyEval_EvalFrameEx () from /usr/pkg/lib/libpython2.6.so.1.0
#8  0xbbb5add8 in PyEval_EvalCodeEx () from /usr/pkg/lib/libpython2.6.so.1.0
#9  0xbbb5ae22 in PyEval_EvalCode () from /usr/pkg/lib/libpython2.6.so.1.0
#10 0xbbb72f12 in run_mod () from /usr/pkg/lib/libpython2.6.so.1.0
#11 0xbbb72fb5 in PyRun_FileExFlags () from /usr/pkg/lib/libpython2.6.so.1.0
#12 0xbbb745e4 in PyRun_SimpleFileExFlags () from /usr/pkg/lib/libpython2.6.so.1.0
#13 0xbbb74ce5 in PyRun_AnyFileExFlags () from /usr/pkg/lib/libpython2.6.so.1.0
#14 0xbbb80322 in Py_Main () from /usr/pkg/lib/libpython2.6.so.1.0
#15 0x080487e9 in main ()
msg143223 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-08-30 16:35
My understanding is that what you did:
import xml.parsers.expat
is now the proper way to use expat. After some searching, it seems the sentence about direct use of pyexpat being deprecated refers to
http://sourceforge.net/tracker/?func=detail&aid=2745230&group_id=26590&atid=387667
"The location and name of the PyExpat module have moved in Python v2.6.1 from  xml.dom.ext.reader.PyExpat to xml.parsers.expat"
This is puzzling becasue xmo.parsers.expat dates back to 2.0 while I see no doc for xml.dom.ext... .

The deprecation notice should be deleted from the 3.x docs.
msg143224 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-08-30 16:39
This seems to be a Mac-only issue.

Barry, does this seem to be a security issue to you, or should we delete 2.6 from the versions?
msg143241 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-08-30 23:20
This is the same issue as highlighted by Issue6676.  The root cause is attempting to reuse a parser instance and that is known to not work with the version of expat included with Python.  Whether the test program crashes with a memory access violation or just uses uninitialized memory depends on the version of malloc in use and what protections the linker and os use.  Even on Mac OS X, the test program does not segfault on earlier versions of OS X (like 10.5).  And on 10.6 and 10.7 if you build python with pymalloc it usually does not segfault.  But that doesn't mean it is working properly.  At a minimum, the single use restriction should be documented; if anyone is interested, they could look into adding any more recent fixes to expat and plugging remaining reuse holes.
msg143296 - (view) Author: David H. Gutteridge (dhgutteridge) Date: 2011-09-01 05:01
Okay.  I'd seen the earlier issue, but had submitted this separately because I wasn't sure if it was a security-related bug, whereas the older issue didn't mention anything of the sort.  (In retrospect, I could've just added to it...)
History
Date User Action Args
2011-09-01 05:01:49dhgutteridgesetmessages: + msg143296
2011-08-30 23:20:26ned.deilysetstatus: open -> closed
resolution: duplicate
superseder: expat parser throws Memory Error when parsing multiple files
messages: + msg143241
2011-08-30 16:39:52terry.reedysetnosy: + barry, ronaldoussoren, ned.deily
messages: + msg143224

assignee: ronaldoussoren
components: + macOS
2011-08-30 16:35:15terry.reedysetmessages: + msg143223
2011-08-30 04:58:45dhgutteridgesetfiles: + pyexpat_crash_isolation_nb.py

messages: + msg143200
versions: + Python 3.1, Python 3.2
2011-08-30 04:37:29dhgutteridgesetmessages: + msg143198
2011-08-30 04:22:38dhgutteridgesetmessages: + msg143197
2011-08-28 18:32:32terry.reedysetmessages: + msg143116
2011-08-28 18:27:13terry.reedysetnosy: + terry.reedy
messages: + msg143115
2011-08-23 23:15:54dhgutteridgesetmessages: + msg142870
2011-08-23 23:10:55dhgutteridgesetmessages: + msg142869
2011-08-23 22:45:03dhgutteridgecreate