classification
Title: re functions never release GIL
Type: performance Stage:
Components: Regular Expressions Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: abacabadabacaba, ezio.melotti, mrabarnett, vstinner
Priority: normal Keywords:

Created on 2015-03-17 17:04 by abacabadabacaba, last changed 2015-03-17 21:32 by vstinner.

Messages (4)
msg238316 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2015-03-17 17:04
Looks like function in re module (match, fullmatch and so on) don't release GIL, even though these operations can take much time. As a result, other threads can't run while a pattern is being matched, and thread switching doesn't happen as well.
msg238321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-17 17:13
Supporting to release the GIL would require to redesign the _sre module.

For example, the getstring() gets a "view" of a Python string, it doesn't copy the string. So we must hold the GIL, otherwise the Python string can be modified by other threads. Copying a very long string may be slower than just match the pattern :-/

During the pattern matching, other Python functions are called, these functions require the GIL to be hold. Example: PyObject_Malloc().
msg238325 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2015-03-17 17:31
Aren't Python strings immutable?

Also, match functions still permit execution of signal handlers, which can execute any Python code.

If GIL is needed during matching, can it be released temporarily to permit thread switching?
msg238342 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-17 21:32
> Aren't Python strings immutable?

Yes. But the re module supports more types than just str and bytes. For example, bytearray is also accepted:

>>> re.match(b'^abc', b'abc')
<_sre.SRE_Match object; span=(0, 3), match=b'abc'>
>>> re.match(b'^abc', bytearray(b'abc'))
<_sre.SRE_Match object; span=(0, 3), match=b'abc'>

> Also, match functions still permit execution of signal handlers, which can execute any Python code.

Correct, signal handlers are called. If you mutate the string currently used in the pattern matching, you can probably crash Python. I hope that nobody does such ugly things in Python signal handlers :-)

> If GIL is needed during matching, can it be released temporarily to permit thread switching?

It's possible to modify the _sre module to release the GIL in some cases. It's possible to release the GIL for immutables string, and keep the GIL for mutable strings. To do this, you have to audit the source code. First, ensure that no global variable is used. For example, the "state" must not be shared (it's ok, it's allocated on the stack, thread stacks are not shared).

If you start to release the GIL, you have to search for all functions which must be called with the GIL hold. For example, memory allocators, but also all functions manipulating Python objects. Hint: seach "PyObject*". For example, getslice() must be called with the GIL hold.

Since the GIL is a lock, you should benchmark to ensure that sequences of acquire/release the GIL doesn't kill performances with a single thread, and with multiple threads. Anyway, a benchmark will be needed.

To be clear: I'm *not* interested to optimize the _sre module to release the GIL (to support parallel executions).
History
Date User Action Args
2017-11-16 13:51:53serhiy.storchakalinkissue24555 superseder
2015-03-17 21:32:37vstinnersetmessages: + msg238342
2015-03-17 17:31:39abacabadabacabasetmessages: + msg238325
2015-03-17 17:13:13vstinnersettype: resource usage -> performance

messages: + msg238321
nosy: + vstinner
2015-03-17 17:04:35abacabadabacabacreate