classification
Title: multiprocessing.Queue in an inconsistent state and a traceback silently suppressed if put an unpickable object and process's target function is finished
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: davin, pitrou, szobov
Priority: normal Keywords:

Created on 2018-11-14 05:26 by szobov, last changed 2018-12-03 08:16 by szobov.

Messages (6)
msg329889 - (view) Author: Sergei Zobov (szobov) * Date: 2018-11-14 05:26
In case if the process target function put something unpickable in the
`multiprocessing.Queue` and finish before `ForkingPickler`'s exception is handled we get two weird situations:
* Internal queue semaphore stay acquired and never will be released, so we get `qsize()` always returns 1(or more if more then one process) even if there is nothing in queue's buffer and `empty()` returns `True`.
* An exception's traceback will be silently suppressed.

To reproduce this behavior I wrote a test in cpython repo (be aware sometime it pass sometime no):
https://github.com/szobov/cpython/commit/95524f10a7b6510e9ab4647223cc67af85ebff86

I suppose it happens because of this:

* The function `util._exit_function()` calls every time process target function finished. Inside this function the global (for current process) flag `utils._exiting` marks as `True`:
https://github.com/python/cpython/blob/master/Lib/multiprocessing/process.py#L300
https://github.com/python/cpython/blob/master/Lib/multiprocessing/util.py#L294-L295
* When exception while pickling object happens in `Queue._feed` function we are ignore it because of the clause `is_exiting()` that checks the flag `utils._exiting` described above.
https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py#L257-L259

So we have unreleased semaphore and ignored exceptions.
I think it's quite disappointing for user because they have broken queue and nothing about why it happened.

I'm not quite sure that it's a bag but I found nothing in documentation about it.
msg329946 - (view) Author: Sergei Zobov (szobov) * Date: 2018-11-15 09:14
I've decided to make the test more clean, so I added `time.sleep` and now we can be sure that here is enough time for queue's underlying process to synchronize all variables:
https://github.com/szobov/cpython/compare/master...regression-test-on-multiprocessing-queue-feeder-ignore-exception

It shows that sometimes `Queue._sem` will never be decreased.
https://github.com/szobov/cpython/blob/master/Lib/multiprocessing/queues.py#L266
If it's undesirable behavior, would it be better to ignore this lines: https://github.com/szobov/cpython/blob/master/Lib/multiprocessing/queues.py#L257-L259 ?

But in my thoughts the problem is in the functions `util._exit_function` and `util.is_exiting`. It's look like that `Queue._thread` share the same variable `util._exiting` with the process that started this thread. If you'll try to print it you see that when function `util._exit_function` executed for process it changes the value of `util._exiting` flag to `True` and after, from `Queue._thread`, the value for this flag stay `True`. But it's the daemon process and it should still work, isn't it? Or I'm missing something important here.
msg330170 - (view) Author: Sergei Zobov (szobov) * Date: 2018-11-21 04:25
Yes, I've missed that the daemonic thread won't work if the parent process finished. The similarities between multiprocessing and threading modules are confusing a bit.
Anyway, could I make a patch to fix the problem with the silently ignored exception and the broken semaphore?
msg330188 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-11-21 10:12
Thanks for reporting this.  This sounds like an extreme corner case and I'm not sure how easy it will be to fix.

For the record, nowadays concurrent.futures should have more graceful behaviour in the face of errors.  Though there probably always will be corner cases like this.
msg330195 - (view) Author: Sergei Zobov (szobov) * Date: 2018-11-21 12:36
Thanks for answer, Antoine!
I see what you mean, but I'm not sure that it's so extreme corner case. We are speaking about Python and it's easy to create an put an unpickable object into queue.
I apologize for the persistence, but I just want make this case more friendly to user. I spent plenty of time to debug it and I think it would be better to prevent someone to do the same things.
If it's a difficult to fix, we can at least write a note somewhere in multiprocessing module doc to describe this behavior.
msg330908 - (view) Author: Sergei Zobov (szobov) * Date: 2018-12-03 08:16
So, should I just close this issue in case we've decided it's not a big problem?
History
Date User Action Args
2018-12-03 08:16:12szobovsetmessages: + msg330908
2018-11-21 12:36:47szobovsetmessages: + msg330195
2018-11-21 10:12:19pitrousetmessages: + msg330188
2018-11-21 04:25:07szobovsetmessages: + msg330170
2018-11-15 11:56:39szobovsetnosy: + pitrou
2018-11-15 09:14:49szobovsetmessages: + msg329946
2018-11-14 17:29:34davinsetnosy: + davin
2018-11-14 12:53:26szobovsetcomponents: + Library (Lib)
2018-11-14 05:26:21szobovcreate