Title: Mutlithread XML parsing cause segfault
Type: crash Stage: needs patch
Components: XML Versions: Python 3.11, Python 3.10, Python 3.9
Created on 2013-05-13 13:52 by mrDoctorWho0.., last changed 2022-04-11 14:57 by admin.

Author: mrDoctorWho0 . (mrDoctorWho0..) Date: 2013-05-13 13:52
Linux i386, Python 2.7.4. Multithread xml parsing via pyexpat cause segmentation fault
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2013-05-13 18:28
Expat is not thread-safe at the object level, a single Parser cannot be used from multiple threads.
Pyexpat could add locks to Parser objects.
Author: Alex Gaynor (alex) Date: 2013-05-13 18:32
It could also track tids and raise an error if you attempt to use it from multiple threads.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2013-05-13 18:35
But this would break working code which already uses locks correctly (or some kind of pool of cached parsers)
Author: Christian Heimes (christian.heimes) Date: 2013-05-14 14:58
In my opinion it's fine to document Python's XML parser as not thread-safe and leave locking to the user. Any fancy locking or tracking is going to make it slower for users. Any it takes a lot of effort to implement the feature, too. lxml offers a faster XML parser with multi-threading support.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2013-05-14 16:22
In my opinion it's not fine to let Python crash.
The implementation could be similar to the one in bufferedio.c, it's quite lightweight.
Author: Antoine Pitrou (pitrou) Date: 2013-05-18 16:20
I agree with Amaury, multi-threaded parsing should definitely not crash. Adding a lock should be quite easy. I wonder what would be the effect on performance, if there are lots of backs and forths between expat and Python.
Author: Irit Katriel (iritkatriel) Date: 2021-09-07 12:55
I've reproduced the segfault on 3.11.
