This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python logic error when deal with re and muti-threading
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: re functions never release GIL
View: 23690
Assigned To: Nosy List: bee13oy, ezio.melotti, mrabarnett, r.david.murray
Priority: normal Keywords:

Created on 2015-07-03 06:20 by bee13oy, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python_logic_error.pdf bee13oy, 2015-07-03 06:20 bug detail infomation
poc&bug_detail.zip bee13oy, 2015-07-03 15:51 1. normal_case.py 2. poc.py 3. bug details
Messages (4)
msg246138 - (view) Author: bee13oy (bee13oy) Date: 2015-07-03 06:20
Bug 0x01 is the main problem.

t.start()
t.join(timeout)
In normal case, I run a while() in sub-thread, the main thread will get the control of the program after the sub-thread is timed out.
But, in our POC, even the sub-thread timed out, the main thread still can't execute continue. After analyzing, I found the main thread trapped into an infinite loop like I described in the PDF.
msg246180 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-07-03 14:27
If you re-post your bug information in a plain text and/or test program format it might get faster attention.
msg246188 - (view) Author: bee13oy (bee13oy) Date: 2015-07-03 15:51
#Python logic error when deal with re and muti-threading 
##Bug Description    
  
When use re and multi-threading it will trigger the bug. 
  
Bug type:   `Logic Error`    

Test Enviroment:    

* `Windows 7 SP1 x64 + python 3.4.3`
* `Linux kali 3.14-kali1-amd64 + python 2.7.3 ` 

-----------------------------Normal Case------------------------
- 1. main-thread: join(timeout), wait for sub-thread finished  -
- 2. sub-thread: while(1), an infinite loop                    -
----------------------------------------------------------------

Test Code:

#!/usr/bin/python
__author__ = 'bee13oy'
import re
import threading
timeout = 2
source = "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\\\\\?caxyz"
def run(source):
    while(1):
        print("test1")   
def handle():
        try:
            t = threading.Thread(target=run,args=(source,))
            t.setDaemon(True)
            t.start()
            t.join(timeout)
            print("thread finished...It's an normal case!\n")
        except:
            print("exception ...\n")		
handle()

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-----------------------------Bug Case-----------------------------------------------------------------------------
- 1. main-thread: join(timeout), wait for sub-thread finished                                                    -
- 2. sub-thread: 1)we construct the special pattern "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\\\\\?caxyz"     -
				 2)regexp.search() can't deal with it, and hang up                                               -
				 3)join(timeout), and the sub-thread was over time, at this time, main-thread should have got    -
				 the control of the program. But it didn't.                                                      -
------------------------------------------------------------------------------------------------------------------

POC:

#!/usr/bin/python
__author__ = 'bee13oy'
import re
import os
import threading
timeout = 2
source = "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\\\\\?caxyz"
def run(source):
    regexp = re.compile(r''+source+'')
    sgroup = regexp.search(source)       
def handle():
        try:
            t = threading.Thread(target=run,args=(source,))
            t.setDaemon(True)
            t.start()
            t.join(timeout)
            print("finished...\n")
        except:
            print("exception ...\n")		
handle()

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
----------------------------------------------------------------
-                          Bug Analyze                         -
----------------------------------------------------------------
When we use Python multithreading, and use `join(timeout)` to wait until the **thread terminates** or **timed out**.   
	1. In normal case, I run a while() in sub-thread, the main thread will get the control of the program after the sub-thread is timed out.
	2. In our POC, even the sub-thread timed out, the main thread still can't execute continue. After analyzing, I found the main thread trapped into an infinite loop. 


At first, it will run into the sub-thread, but it can't end normally. 
At this time, join(timeout) will wait for the sub-thread return or timed out, and try to call timed out function in order that main thread can get the control of the program.

The bug is that the sub-thread was into an infinite loop and the main-thread was into an infinite loop too, which causes the program to be hang up.  

By analyzing the source code of Python, we found that:
  
- sub-thread is into an infinite loop   (code block 0)
- main-thread is into an infinite loop  (code block 1)
  
-----------------------------code block 0---------------------------------- 
- the following code is where sub-thread trapped into an infinite loop:  -
--------------------------------------------------------------------------- 
the following code is where the sub-thread trapped into an **infinite loop**:  
```
LOCAL(Py_ssize_t)
SRE(match)(SRE_STATE* state, SRE_CODE* pattern, int match_all)
{
    SRE_CHAR* end = (SRE_CHAR *)state->end;
    Py_ssize_t alloc_pos, ctx_pos = -1;
    Py_ssize_t i, ret = 0;
    Py_ssize_t jump;
    unsigned int sigcount=0;
    SRE(match_context)* ctx;
    SRE(match_context)* nextctx;
    TRACE(("|%p|%p|ENTER\n", pattern, state->ptr));
    DATA_ALLOC(SRE(match_context), ctx);
    ctx->last_ctx_pos = -1;
    ctx->jump = JUMP_NONE;
    ctx->pattern = pattern;
    ctx->match_all = match_all;
    ctx_pos = alloc_pos;
	.....	
	/* Cycle code which will never return*/
	for (;;) {
	++sigcount;
	if ((0 == (sigcount & 0xfff)) && PyErr_CheckSignals())
		RETURN_ERROR(SRE_ERROR_INTERRUPTED);

	switch (*ctx->pattern++) {
	case SRE_OP_MARK:
		/* set mark */
		/* <MARK> <gid> */
		TRACE(("|%p|%p|MARK %d\n", ctx->pattern,
			   ctx->ptr, ctx->pattern[0]));
	.....
}
```


  
-----------------------------code block 1---------------------------------- 
- the following code is where main-thread trapped into an infinite loop:  -
---------------------------------------------------------------------------
static void take_gil(PyThreadState *tstate)
{
    int err;
    if (tstate == NULL)
        Py_FatalError("take_gil: NULL tstate");
    
    err = errno;
    MUTEX_LOCK(gil_mutex);    
    if (!_Py_atomic_load_relaxed(&gil_locked))
        goto _ready;		
	/*Cycle code which will never return*/
    while (_Py_atomic_load_relaxed(&gil_locked)) {
        int timed_out = 0;
        unsigned long saved_switchnum;
        saved_switchnum = gil_switch_number;
        COND_TIMED_WAIT(gil_cond, gil_mutex, INTERVAL, timed_out);
        /* If we timed out and no switch occurred in the meantime, it is time
           to ask the GIL-holding thread to drop it. */
        if (timed_out &&
            _Py_atomic_load_relaxed(&gil_locked) &&
            gil_switch_number == saved_switchnum) {
            SET_GIL_DROP_REQUEST();
        }
    }
	.....
}
msg246193 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2015-07-03 17:20
Your regex is a pathological case: it suffers from catastrophic backtracking and can take a long time to finish.

The other problem is that the re module never releases the GIL, so while it's performing the search in the low-level C code, other Python threads don't get a chance to run.
History
Date User Action Args
2022-04-11 14:58:18adminsetgithub: 68743
2017-11-16 13:51:53serhiy.storchakasetstatus: open -> closed
superseder: re functions never release GIL
resolution: duplicate
stage: resolved
2015-07-03 17:20:29mrabarnettsetmessages: + msg246193
2015-07-03 15:51:38bee13oysetfiles: + poc&bug_detail.zip

messages: + msg246188
2015-07-03 14:27:27r.david.murraysetnosy: + r.david.murray
messages: + msg246180
2015-07-03 06:20:15bee13oycreate