classification
Title: Segmentation fault on Linux with multiprocess queue
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: davin, iritkatriel, julien.stegle, pitrou
Priority: normal Keywords:

Created on 2020-12-07 11:08 by julien.stegle, last changed 2022-01-24 11:02 by iritkatriel. This issue is now closed.

Messages (3)
msg382629 - (view) Author: Stegle, Julien (julien.stegle) Date: 2020-12-07 11:08
Hi,

I'm experiencing segmentation fault issues when running inside a Docker container (tested with python:3.8.6, python:3.8.6-slim, python:3.6.8 & python:3.6.8-slim).

On windows everything works fine, but when running on Docker when I try to put a string into my queue I'm experiencing the following error:

GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...(no debugging symbols found)...done.
Starting program: /usr/local/bin/python -m ge.xxx.core.runners.exec basic reader /home/executor/xxx/exemple1.ini
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 12]
[New Thread 0x7f11f66c7700 (LWP 13)]
[New Thread 0x7f11f3ec6700 (LWP 14)]
[New Thread 0x7f11f16c5700 (LWP 15)]
[New Thread 0x7f11eeec4700 (LWP 16)]
[New Thread 0x7f11ec6c3700 (LWP 17)]
[New Thread 0x7f11e9ec2700 (LWP 18)]
[New Thread 0x7f11e76c1700 (LWP 19)]
[New Thread 0x7f11e4ec0700 (LWP 20)]
[New Thread 0x7f11e26bf700 (LWP 21)]
[New Thread 0x7f11dfebe700 (LWP 22)]
[New Thread 0x7f11df6bd700 (LWP 23)]
[Thread 0x7f11e9ec2700 (LWP 18) exited]
[Thread 0x7f11e76c1700 (LWP 19) exited]
[Thread 0x7f11ec6c3700 (LWP 17) exited]
[Thread 0x7f11eeec4700 (LWP 16) exited]
[Thread 0x7f11f16c5700 (LWP 15) exited]
[Thread 0x7f11f3ec6700 (LWP 14) exited]
[Thread 0x7f11f66c7700 (LWP 13) exited]
[Thread 0x7f11e26bf700 (LWP 21) exited]
[Thread 0x7f11dfebe700 (LWP 22) exited]
[Thread 0x7f11e4ec0700 (LWP 20) exited]
[Thread 0x7f11df6bd700 (LWP 23) exited]
[Detaching after fork from child process 24]
[Detaching after fork from child process 25]
[New Thread 0x7f11df6bd700 (LWP 26)]
[New Thread 0x7f11dfebe700 (LWP 27)]
[New Thread 0x7f11e26bf700 (LWP 28)]
[New Thread 0x7f11e4ec0700 (LWP 29)]
[New Thread 0x7f11cc893700 (LWP 30)]
[New Thread 0x7f11ca092700 (LWP 31)]
[New Thread 0x7f11c7891700 (LWP 32)]
[New Thread 0x7f11c5090700 (LWP 33)]
[New Thread 0x7f11c488f700 (LWP 34)]
[New Thread 0x7f11c208e700 (LWP 35)]
[New Thread 0x7f11bd88d700 (LWP 36)]
[New Thread 0x7f11b5def700 (LWP 37)]
[Thread 0x7f11e4ec0700 (LWP 29) exited]
[Thread 0x7f11c7891700 (LWP 32) exited]
[Thread 0x7f11ca092700 (LWP 31) exited]
[Thread 0x7f11c488f700 (LWP 34) exited]
[Thread 0x7f11c5090700 (LWP 33) exited]
[Thread 0x7f11cc893700 (LWP 30) exited]
[Thread 0x7f11e26bf700 (LWP 28) exited]
[Thread 0x7f11dfebe700 (LWP 27) exited]
[Thread 0x7f11df6bd700 (LWP 26) exited]
[Thread 0x7f11bd88d700 (LWP 36) exited]
[Thread 0x7f11c208e700 (LWP 35) exited]
[Detaching after fork from child process 38]
[Thread 0x7f11b5def700 (LWP 37) exited]
2020-12-07T09:16:12.760+0000 - ge.xxx.inputs - INFO - Filesystem input "ge.xxx.core.inputs.reader.dicom" initialized, target: /data/dicoms
2020-12-07T09:16:12.780+0000 - ge.xxx.parsers - INFO - Dicom data parser initialized
2020-12-07T09:16:12.918+0000 - alembic.env - INFO - Migrating database engine_data
2020-12-07T09:16:12.927+0000 - alembic.env - INFO - Migrating database engine_application
2020-12-07T09:16:12.933+0000 - alembic.env - INFO - Migrating database engine_enhancement
2020-12-07T09:16:12.956+0000 - ge.xxx.core - INFO - Duct 'default' initialized
2020-12-07T09:16:12.957+0000 - ge.xxx.inputs - INFO - Start filesystem input thread "ge.xxx.core.inputs.reader.dicom"
[Detaching after fork from child process 39]
[Detaching after fork from child process 40]
2020-12-07T09:16:13.639+0000 - ge.xxx.inputs - DEBUG - Start reading file system
2020-12-07T09:16:13.666+0000 - ge.xxx.inputs - DEBUG - File /data/dicoms/1/XXXXXXXXXXX.dcm did not match filters
2020-12-07T09:16:15.881+0000 - ge.xxx.inputs - DEBUG - ADDING DICOMREADER W/ TO QUEUE <multiprocessing.queues.Queue object at 0x7f0cafcc8c50>
Fatal Python error: Segmentation fault

Current thread 0x00007f0cb732f740 (most recent call first):
  File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 82 in put
  File "/home/executor/ge.xxx.extractor/ge/xxx/extractor/input/filesystem/reader.py", line 36 in feed
  File "/home/executor/ge.xxx.duct/ge/xxx/duct/input/filesystem/reader.py", line 58 in scan
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93 in run
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258 in _bootstrap
  File "/usr/local/lib/python3.6/multiprocessing/spawn.py", line 118 in _main
  File "/usr/local/lib/python3.6/multiprocessing/spawn.py", line 105 in spawn_main
  File "<string>", line 1 in <module>
^C
Thread 1 "python" received signal SIGINT, Interrupt.
0x00007f11fac97037 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, 
    exceptfds=0x0, timeout=0x7fff0c5d6970)
    at ../sysdeps/unix/sysv/linux/select.c:41
41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) bt full
#0  0x00007f11fac97037 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, 
    exceptfds=0x0, timeout=0x7fff0c5d6970)
    at ../sysdeps/unix/sysv/linux/select.c:41
        resultvar = 18446744073709551102
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00007f11fb11360e in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#2  0x00007f11fb03f079 in _PyCFunction_FastCallDict ()
   from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#3  0x00007f11fb08558d in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#4  0x00007f11fb07f49a in _PyEval_EvalFrameDefault ()
   from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#5  0x00007f11fb08585a in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#6  0x00007f11fb0856b1 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#7  0x00007f11fb07f49a in _PyEval_EvalFrameDefault ()
   from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--c
#8  0x00007f11fb07eae5 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#9  0x00007f11fb085c4b in _PyFunction_FastCallDict () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#10 0x00007f11fb0097be in _PyObject_FastCallDict () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#11 0x00007f11fb00a126 in _PyObject_Call_Prepend () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#12 0x00007f11fb009b07 in PyObject_Call () from /usr/local/lib/libpython3.6m.so.1.0
#8  0x00007f11fb07eae5 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#9  0x00007f11fb085c4b in _PyFunction_FastCallDict () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#10 0x00007f11fb0097be in _PyObject_FastCallDict () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#11 0x00007f11fb00a126 in _PyObject_Call_Prepend () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#12 0x00007f11fb009b07 in PyObject_Call () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#13 0x00007f11fb0545ad in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#14 0x00007f11fb05236b in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#23 0x00007f11fb07e719 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#24 0x00007f11fb0858e2 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#25 0x00007f11fb0856b1 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#26 0x00007f11fb07f49a in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#27 0x00007f11fb07da81 in PyEval_EvalCodeEx () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#28 0x00007f11fb01fa23 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#29 0x00007f11fb009b07 in PyObject_Call () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#30 0x00007f11fb1063a1 in ?? () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#31 0x00007f11fb10620b in Py_Main () from /usr/local/lib/libpython3.6m.so.1.0
No symbol table info available.
#32 0x0000557b7762a37b in main ()
No symbol table info available.

Faulty code:

def feed(logger: logging.Logger, name: str, root: Path, filters: List[Filter], supported_extensions: List[str], queues: List[Queue]):
    if supported_extensions:
        for extension in supported_extensions:
            for filepath in root.rglob(f'*.{extension}'):
                if any([not x.validate(filepath) for x in filters]):
                    logger.debug(f'File {filepath} did not match filters')
                    continue
                for queue in queues:
                    logger.debug(f'ADDING DICOMREADER W/ TO QUEUE {queue}')
                    queue.put(str(filepath))                                           # CRASH HAPPENS HERE
                    logger.debug('DONE ADDING TO QUEUE')
                logger.debug(f'[{name}] New file created and add to queues {filepath}')
    else:
        for filepath in root.rglob(f'*.*'):
            if any([not x.validate(filepath) for x in filters]):
                logger.debug(f'File {filepath} did not match filters')
                continue
            for queue in queues:
                logger.debug('ADDING DICOMREADER W/O TO QUEUE')
                queue.put(str(filepath))
                logger.debug('DONE ADDING TO QUEUE')
            logger.debug(f'[{name}] New file created and add to queues {filepath}')

def scan(logger_name: str, logging_configuration: str, name: str,
         feed: Callable[[logging.Logger, Path, List[Filter], List[str], List[Queue]], None],
         directory: str, filters: List[Filter], supported_extensions: List[str], queues: List[Queue], done_event: Event):
    if logging_configuration:
        setup_logging(logging_configuration)
    logger = logging.getLogger(logger_name)
    logger.debug('Start reading file system')

    root = Path(directory)
    feed(logger, name, root, filters, supported_extensions, queues)

    logger.info("All folder read. Reader function ends now")
    done_event.clear()

Here's how the process is created:

def get_executor(self, args: tuple = (), kwargs: dict = {}):
    return get_context("spawn").Process(target=self._executor_target, args=args, kwargs=kwargs, daemon=True)

And here are the args used:

from multiprocessing import Queue
from ge.XXX import Filter

class Filter(ABC):

    def __init__(self, logger_name: str, logging_configuration: str):
        self.logger_name = logger_name
        self.logging_configuration = logging_configuration

    def get_logger(self):
        return logging.getLogger(self.logger_name)

    def validate(self, data: Any) -> bool:
        raise NotImplementedError

class Input(FileSystemBase):

    def __init__(self, logger_name: str, logging_configuration: str, name: str, output_queues: List[Queue],
                 directory: str, filters: List[Filter] = [], supported_extensions: List[str] = [],
                 feed: Callable[[logging.Logger, Path, List[Filter], List[str], List[Queue]], None] = feed):
        super().__init__(logger_name, logging_configuration, name, output_queues, directory, filters=filters)
        self._feed = feed
        self._executor_target = scan
        self._supported_extensions = supported_extensions
        self._executor = self.get_executor(
            args=(logger_name, logging_configuration, name,
                  feed, directory, filters, supported_extensions,
                  output_queues, self._running)
        )

After a quick lookup in the source code the issue seems to be related to this line in multiprocessing Queue:

File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 82 in put

    def put(self, obj, block=True, timeout=None):
        assert not self._closed
        if not self._sem.acquire(block, timeout):     # THIS ONE
            raise Full

        with self._notempty:
            if self._thread is None:
                self._start_thread()
            self._buffer.append(obj)
            self._notempty.notify()

Is there a specific Linux behavior that i'm missing here ?
Just to be sure it wasn't coming from my consumer I didn't start the process consuming from the queue (in case it came from a Lock issue) so only this process runs and no other code is used for the test.

Best regards,
Julien Stegle
msg382633 - (view) Author: Stegle, Julien (julien.stegle) Date: 2020-12-07 13:31
By removing the use the of "spawn" context for multiprocessing everything works fine.
Does the "spawn" context have a different behavior on Linux ?
msg411463 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-24 11:02
Python 3.8 is no longer maintained, and there have been changes in this area since then. Please create a new issue if you are having this problem with a more recent python version. 

To answer you last question, yes - there are platform dependent differences in the behavior of spawn.
History
Date User Action Args
2022-01-24 11:02:26iritkatrielsetstatus: open -> closed

nosy: + iritkatriel
messages: + msg411463

resolution: out of date
stage: resolved
2020-12-07 19:26:22izbyshevsetnosy: + pitrou, davin

components: + Library (Lib)
versions: - Python 3.6
2020-12-07 13:31:06julien.steglesetmessages: + msg382633
2020-12-07 11:08:45julien.steglecreate