classification
Title: Multiprocessing shared_memory ValueError on race with ShareableList
Type: behavior Stage:
Components: Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bjs, davin, mental, pierreglaser, pitrou
Priority: normal Keywords:

Created on 2019-07-22 15:13 by bjs, last changed 2019-09-11 17:44 by davin.

Files
File name Uploaded Description Edit
test.py bjs, 2019-07-22 15:13
Messages (9)
msg348299 - (view) Author: Ben (bjs) * Date: 2019-07-22 15:13
When running the attached on 3.8 and 3.9 (master) I get the following:

Process Process-3:
Traceback (most recent call last):                                                                                                                       
  File "/home/bjs/.pyenv/versions/3.9-dev/lib/python3.9/multiprocessing/process.py", line 313, in _bootstrap
    self.run()    
  File "/home/bjs/.pyenv/versions/3.9-dev/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/bjs/Downloads/e4671450a50df5a0648c6dda0c7b642a-3db67c29d8d9e6a6d629c2c016f5853ec22000ed/test.py", line 14, in g
    X[0]
  File "/home/bjs/.pyenv/versions/3.9-dev/lib/python3.9/multiprocessing/shared_memory.py", line 413, in __getitem__
    (v,) = struct.unpack_from(
ValueError: not enough values to unpack (expected 1, got 0)

(Tested on Windows and Linux)

Either this is a documentation error,  and the docs for shared_memory should state that the ShareableList does not have atomic operations and so this is unsafe,  or this is a suspicious behaviour.  I'm not sure which.

I could also just be using the library totally incorrectly.
msg348443 - (view) Author: (mental) * Date: 2019-07-25 16:19
I have been able to reproduce the issue.

I'm currently investigating to see what exactly inside the source of ShareableList is causing what appears to be miss synchronization.

Currently I think the issue is the documentation failing to mention that ShareableList does not perform atomic operations.

But I'm curious if a patch that adds some internal sync primitive is out of the question here?
msg348449 - (view) Author: (mental) * Date: 2019-07-25 19:01
Interestingly enough the race only seems to occur when the mutator resets the index with an identical value (try replacing the setitem value with: `not X[0]` or populate from `[b'hi']` and with the mutator code use `X[0][::1]`.

I can't help but think to implement an internal cache that would store the hash of the last modified value, then when another set occurs compare the two hashes and have the operation be a no-op if they match (thoughts?).

In the meantime I'll continue investigating to see what exactly causes this quirk :)
msg348487 - (view) Author: Pierre Glaser (pierreglaser) * Date: 2019-07-26 11:23
The root of the error is that struct.pack_into starts by memsetting the underlying memory area with NULL bytes before filling the data with memcpy. If ShareableList._get_packing_format is called between the two operations (through a concurrent __getitem__ call from another process), struct.unpack_from will return an empty tuple which is the direct cause of the error you're seeing.

In the general case though, memcpy is not atomic so even without the memset call before, results of struct.unpack_from may be invalid in a concurrent setting.

shared_memory is a low level python module. Precautions should be made when handling concurrently the shared_memory objects using synchronization primitives for example. I'm not sure this should be done internally in the SharedMemory class -- especially, we don't want to slow down concurrent READ access. +1 For a documentation addition.
msg348536 - (view) Author: (mental) * Date: 2019-07-27 01:27
> The root of the error is that struct.pack_into starts by memsetting the underlying memory area with NULL bytes before filling the data with memcpy.

I've had a fix spinning for about a day now, it introduced a `multiprocessing.Lock` and it was simply wrapped around any struct packing and unpacking calls.

I've been reluctant to submit anything due to a suspicious resource warning I kept seeing about leaked shared_memory objects appearing spuriously and I wanted to rule out the possibility that other tests were causing a side effect.

Also I wanted to hear an expert from the noisy list share their thoughts.

> I'm not sure this should be done internally

I agree, even with my patch not reproducing the issue I didn't like placing a lock around various components in the class.

> +1 For a documentation addition.

If there are no objections, I'll submit a PR with a doc update :)

@Pierre mind if I cc' you for a review?
msg348604 - (view) Author: Pierre Glaser (pierreglaser) * Date: 2019-07-29 08:53
Sure, although I won't be able to merge it. Make sure you ping a core-dev such as pitrou or davin :-)
msg348606 - (view) Author: Ben (bjs) * Date: 2019-07-29 10:50
It would be nice to get davin to clarify the API for this module.

What are the use cases for SharedMemory and ShareableList? 
Are you supposed to ever use a raw SharedMemory buffer directly?
What atomicity guarantees are there for ShareableList operations and read/write to the SharedMemory buffer?
msg351998 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2019-09-11 17:38
Short responses to questions/comments from @bjs, followed by hopefully helpful further comments:

> Are you supposed to ever use a raw SharedMemory buffer directly?

Yes.


> What atomicity guarantees are there for ShareableList operations and read/write to the SharedMemory buffer?

None.


> I've had a fix spinning for about a day now, it introduced a `multiprocessing.Lock` and it was simply wrapped around any struct packing and unpacking calls.

That sounds like a nice general-purpose fix for situations where it is impossible to plan ahead to know when one or more processes will need to modify the ShareableList/SharedMemory.buf.  When it is possible to design code to ensure, because of the execution flow through the code, no two processes/threads will attempt to modify and access/modify the same location in memory at the same time, locks become unnecessary.  Locks are great tools but they generally result in slower executing code.


> What are the use cases for SharedMemory and ShareableList?

Speed.  If we don't care about speed, we can use distributed shared memory through the SyncManager -- this keeps one copy of a dict/list/whatever in the process memory space of a single process and all other processes may modify or access it through two-sided communication with that "owner" process.  If we do care about speed, we use SharedMemory and ShareableList and other things created on top of SharedMemory -- this effectively gives us fast, communal memory access where we avoid the cost of communication except for when we truly need to synchronize (where multiprocessing.Lock can help).

Reduced memory footprint.  If I have a "very large" amount of data consuming a significant percentage of the available memory on my system, I can make it available to multiple processes without duplication.  This provides processes with fast access to that data, as fast as if each were accessing data in its own process memory space.  It is one thing to imagine using this in parallel-executing code, but this can be just as useful in the Python interactive shell.  One such scenario:  after starting a time-consuming, non-parallel calculation in one Python shell, it is possible to open a new Python shell in another window and attach to the data through shared memory to continue work while the calculation runs in the first window.
msg352001 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2019-09-11 17:44
Apologies, one of the quotes in my previous response should have been attributed to @mental.

I think @pierreglaser phrased it very nicely:
> shared_memory is a low level python module. Precautions should be made when handling concurrently the shared_memory objects using synchronization primitives for example. I'm not sure this should be done internally in the SharedMemory class -- especially, we don't want to slow down concurrent READ access.


Per the further suggestion:
> +1 For a documentation addition.

I can take a crack at adding something more along the lines of this discussion, but I would very much welcome suggestions (@bjs, @mental, @pierreglaser)...
History
Date User Action Args
2019-09-11 17:44:09davinsetmessages: + msg352001
2019-09-11 17:38:50davinsetmessages: + msg351998
2019-07-29 10:50:32bjssetmessages: + msg348606
2019-07-29 08:53:12pierreglasersetmessages: + msg348604
2019-07-27 01:27:06mentalsetmessages: + msg348536
2019-07-26 11:23:47pierreglasersetmessages: + msg348487
2019-07-25 19:01:31mentalsetmessages: + msg348449
2019-07-25 16:19:06mentalsetnosy: + mental
messages: + msg348443
2019-07-25 15:46:25pitrousetnosy: + pierreglaser
2019-07-22 15:14:26xtreaksetnosy: + pitrou, davin
2019-07-22 15:13:18bjscreate