classification
Title: Py3.6 threading/reference counting issues with `numexpr`
Type: crash Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 3.6
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Robert McLeod, mark.dickinson, pitrou
Priority: normal Keywords:

Created on 2017-08-03 21:04 by Robert McLeod, last changed 2018-11-21 20:52 by mark.dickinson. This issue is now closed.

Messages (4)
msg299722 - (view) Author: Robert McLeod (Robert McLeod) Date: 2017-08-03 21:04
I'm working on the development branch of the `numexpr` module and I've run into some problems on Python 3.6, where I seem to get a variety of errors relating to threading or reference counting errors. This module is commonly used for accelerating NumPy code, as it can parse an expression into a C-extension virtual machine program and break the calculations down into blocks which are then dispatched over multiple threads. I get similar errors on both Ubuntu 16.04 with GCC5.4 and Win7 with MSVC2015.  I created an issue here where I provide some of the `gcc` error outputs.  

https://github.com/pydata/numexpr/issues/252

Typically I'm getting a different error on every run. 

We use a wrapper for Windows threads that emulated pthreads, and I found on Windows crashes occurred on calling the Windows system function `WaitForSingleObject(cond->sema, INFINITE);` in the file numexpr3/win32/pthread.c.  

I cannot replicate this problem in Python 2.7/3.4/3.5 on Windows or Linux. I'm using Anaconda in both instances, with nomkl NumPy on Linux and mkl NumPy on Windows.  

I tried valgrinding with Python 3.5 and 3.6, and I get numerous errors coming from places like pickle and ast (which the new NumExpr uses) in 3.6 and it's basically clean in 3.5.  The logs are attached to the issue linked above.
msg299762 - (view) Author: Robert McLeod (Robert McLeod) Date: 2017-08-04 21:54
After building with Python3.7 I was able to get a useful error message that `PyMem` functions were being called inside GIL release.  I will replace these with C-equivalents and try with Python 3.6.

    Fatal Python error: Python memory allocator called without holding the GIL

    Thread 0x0000000004044e00 (most recent call first):
      File "/home/rmcleod/py37/lib/python3.7/site-packages/numexpr3-3.0==19176== 
    ==19176== Process terminating with default action of signal 6 (SIGABRT)
    ==19176==    at 0x579C428: raise (raise.c:54)
    ==19176==    by 0x579E029: abort (abort.c:89)
    ==19176==    by 0x422DF7: Py_FatalError (pylifecycle.c:1849)
    ==19176==    by 0x41ED4C: _PyMem_DebugCheckGIL (obmalloc.c:1972)
    ==19176==    by 0x41ED23: _PyMem_DebugMalloc (obmalloc.c:1980)
    ==19176==    by 0x41FCAC: PyMem_Malloc (obmalloc.c:418)
    ==19176==    by 0xCEA1920: NumExprObject_copy_threadsafe(NumExprObject const*) (interpreter.cpp:147)
    ==19176==    by 0xCEA77CE: th_worker(void*) (module.cpp:73)
    ==19176==    by 0x4E416B9: start_thread (pthread_create.c:333)
msg299778 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-08-05 12:48
If, judging by https://github.com/pydata/numexpr/commit/07d9245d88759f0c3dcabd88e6edefadc3061ee3, you are really calling a bunch of C API functions without holding the GIL, then it's not surprising you may get crashes all over the place.
msg330215 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-11-21 20:52
Judging by the current state of https://github.com/pydata/numexpr/issues/252, it does seem as though this is resolved downstream, so probably safe to close here. Robert: feel free to reopen if I misunderstood.
History
Date User Action Args
2018-11-21 20:52:54mark.dickinsonsetstatus: pending -> closed

nosy: + mark.dickinson
messages: + msg330215

stage: resolved
2018-11-21 08:48:36serhiy.storchakasetstatus: open -> pending
2017-08-05 12:48:00pitrousetnosy: + pitrou
messages: + msg299778
2017-08-04 21:54:36Robert McLeodsetmessages: + msg299762
2017-08-03 21:04:49Robert McLeodcreate