Issue39959
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-03-13 19:43 by dxflores, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 20136 | closed | python-dev, 2020-05-16 21:19 |
Messages (11) | |||
---|---|---|---|
msg364121 - (view) | Author: Diogo Flores (dxflores) * | Date: 2020-03-13 19:43 | |
Hello, I came across with what seems like a bug (or at least a disagreement with the current documentation). Discussion: I expected that after creating a numpy-array (on terminal 1), which is backed by shared memory, I would be able to use it in other terminals until I would call `shm.unlink()` (on terminal 1), at which point, the memory block would be released and no longer accessible. What happened is that after accessing the numpy-array from terminal 2, I called 'close()' on the local 'existing_shm' instance and exited the interpreter, which displayed the `warning` seen below. After, I tried to access the same shared memory block from terminal 3, and a FileNotFoundError was raised. (The same error was also raised when I tried to call 'shm.unlink()' on terminal 1, after calling 'close()' on terminal 2.) It seems that calling `close()` on an instance, destroys further access to the shared memory block from any point, while what I expected was to be able to access the array (i.e. on terminal 2), modify it, "close" my access to it, and after be able to access the modified array on i.e. terminal 3. If the error is on my side I apologize for raising this issue and I would appreciate for clarification on what I am doing wrong. Thank you. Diogo Please check below for the commands issued: ## Terminal 1 >>> from multiprocessing import shared_memory >>> import numpy as np >>> >>> a = np.array([x for x in range(10)]) >>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes) >>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf) >>> b[:] = a[:] >>> >>> shm.name 'psm_592ec635' ## Terminal 2 >>> from multiprocessing import shared_memory >>> import numpy as np >>> >>> existing_shm = shared_memory.SharedMemory('psm_592ec635') >>> c = np.ndarray((10,), dtype=np.int64, buffer=existing_shm.buf) >>> >>> c array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> del c >>> existing_shm.close() >>> >>> exit() ~: /usr/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' ## Finally, on terminal 3 >>> from multiprocessing import shared_memory >>> import numpy as np >>> >>> existing_shm = shared_memory.SharedMemory('psm_592ec635') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.8/multiprocessing/shared_memory.py", line 100, in __init__ self._fd = _posixshmem.shm_open( FileNotFoundError: [Errno 2] No such file or directory: '/psm_592ec635' |
|||
msg364351 - (view) | Author: Jeff Fischer (jfischer) | Date: 2020-03-16 18:48 | |
I've run into the same problem. It appears that the SharedMemory class is assuming that all clients of a segment are child processes from a single parent, and that they inherit the same resource_tracker. If you run separate, unrelated processes, you get a separate resource_tracker for each process. Then, when a process does a close() followed by a sys.exit(), the resource tracker detects a leak and unlinks the segment. In my application, segment names are stored on the local filesystem and a specific process is responsible for unlinking the segment when it is shut down. I was able to get this model to work with the current SharedMemory implementation by having processes which are just doing a close() also call resource_tracker.unregister() directly to prevent their local resource trackers from destroying the segment. I imagine the documentation needs some discussion of the assumed process model and either: 1) a statement that you need to inherit the resource tracker from a parent process, 2) a blessed way to call the resource tracker to manually unregister, or 3) a way to disable the resource tracker when creating the SharedMemory object. |
|||
msg364481 - (view) | Author: Diogo Flores (dxflores) * | Date: 2020-03-17 21:43 | |
Follow up - tested on Linux (The first solution). The solution presented below will fix the problem with the caveat that the base process (the one that creates the shared-memory obj) must outlive any process that use the shared-memory. The rationale is that unless *this* process is creating a new shared-memory object (as opposed to attaching itself to an already existing one), then there is no point to register itself to be tracked. By making this small change, the problem I mentioned when I opened this issue disappears. #---------------------------------------------------- # https://github.com/python/cpython/blob/master/Lib/multiprocessing/shared_memory.py#L116 by changing: from .resource_tracker import register register(self._name, "shared_memory") # To: if create: from .resource_tracker import register register(self._name, "shared_memory") #---------------------------------------------------- To retain the ability for the base process to be able to exit before those that use the shared-memory obj that the base process itself created (the current/problematic implementation), as well as fix the issue, I suggest the following approach: When (and only when) a new shared-memory obj is created, such is registered on a new class variable of the resource-tracker, hence it can always be accessed and closed/unlinked by any process later on - this differs from the current approach, where each process that wants to access the shared-memory obj is being registered on the resource-tracker. I look forward for any discussion on the subject. Thank you, Diogo |
|||
msg366687 - (view) | Author: Diogo Flores (dxflores) * | Date: 2020-04-17 23:49 | |
Any update on this issue? |
|||
msg368770 - (view) | Author: Floris (fvdnabee) | Date: 2020-05-13 10:03 | |
I confirm the same issue as Diogo. The provided workaround of unregistering the sharedmemory segment in the 'consuming' process, as suggested by Jeff, solves the issue where exiting the consuming process causes the tracker to incorrectly free the shared memory. Diogo's fix to shared_memory.py#L116 does just that (actually it avoids registering it in the first place) and therefor seems ok to me. |
|||
msg369127 - (view) | Author: Rauan Mukhamejanov (rauanargyn) * | Date: 2020-05-17 15:25 | |
Not sure about "it can always be accessed and closed/unlinked by any process later on", as each process will be spawning its own resource_tracker, using a separate pipe. Thus, unregister calls from other processes will not have any effect. The documentation is indeed unclear that processes must share the resource_tracker. Can we introduce a new flag - "persist", that would indicate no resource tracking is needed? Registering will only happen if create=True and persist=False, meaning the user accepts the creating process must outlive all other processes that could connect to the shared memory. If persist=False, the user accepts the responsibility for manual cleaning up of the allocated memory. This will allow catering to a wider range of use cases, where readers/writer processes can exit and re-connect to shared_memory as they see fit. |
|||
msg373475 - (view) | Author: David Parks (davidparks21) | Date: 2020-07-10 17:43 | |
Having a flag seems like a good solution to me. I've also encountered this issue and posted on stack overflow about it here: https://stackoverflow.com/questions/62748654/python-3-8-shared-memory-resource-tracker-producing-unexpected-warnings-at-appli |
|||
msg373800 - (view) | Author: Vinay Sharma (vinay0410) * | Date: 2020-07-17 08:23 | |
Hi, shared_memory has lot of issues which are mainly being caused due to resource tracking. Initially resource tracking was implemented to keep track of semaphores only, but for some reason resource tracker also started to keep track of shared_memory. This causes shared memory to be practically useless when used by unrelated processes, because it will be unlinked as soon as a process dies, by processes which are yet to be spawned. There is already a PR open to fix this https://github.com/python/cpython/pull/15989/files , by applio(a core developer), but for some reason it hasn't been merged yet. I will try to fix the conflicts and request it to be merged. Now, this will fix most of the issues in shared memory, but still the current implementation of shared memory for linux won't be consistent with windows (which isn't at the moment also). You can read more about the same here: https://bugs.python.org/issue38119#msg352050 |
|||
msg373801 - (view) | Author: Vinay Sharma (vinay0410) * | Date: 2020-07-17 08:28 | |
@rauanargyn , persist flag won't be good idea because it cannot be supported for windows easily, since windows uses a reference counting mechanism to keep track of shared memory and frees it as soon as all the processes using it are done. |
|||
msg374414 - (view) | Author: Diogo Flores (dxflores) * | Date: 2020-07-27 18:47 | |
I have tried a different approach using https://gitlab.com/tenzing/shared-array and I got it to perform well on Linux. Basically, the code above places all numpy arrays in /dev/shm which allows you to access and modify them from any number of processes without creating any copies; for deleting is equally simple - The code provides a SharedArray.list() to list all objects that itself placed in /dev/shm and so one can just iterate over the list and delete each element. (An easier approach is to use PathLib and just unlik all shared memory objects in /dev/shm) I guess a solution based on Mat's code could be adapted to try and solve the shared-memory problems. I look forward for further discussion on the subject. Diogo |
|||
msg374433 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2020-07-27 22:41 | |
I declare this a duplicate of issue 38119. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:28 | admin | set | github: 84140 |
2020-07-27 22:41:33 | gvanrossum | set | status: open -> closed superseder: resource tracker destroys shared memory segments when other processes should still have valid access nosy: + gvanrossum messages: + msg374433 resolution: duplicate stage: patch review -> resolved |
2020-07-27 18:47:15 | dxflores | set | messages: + msg374414 |
2020-07-17 08:28:26 | vinay0410 | set | messages: + msg373801 |
2020-07-17 08:23:43 | vinay0410 | set | nosy:
+ vinay0410 messages: + msg373800 |
2020-07-10 17:43:47 | davidparks21 | set | nosy:
+ davidparks21 messages: + msg373475 |
2020-05-17 15:25:54 | rauanargyn | set | nosy:
+ rauanargyn messages: + msg369127 |
2020-05-16 21:19:30 | python-dev | set | keywords:
+ patch nosy: + python-dev pull_requests: + pull_request19441 stage: patch review |
2020-05-13 10:03:32 | fvdnabee | set | nosy:
+ fvdnabee messages: + msg368770 |
2020-04-17 23:49:38 | dxflores | set | messages:
+ msg366687 title: (Possible) bug on multiprocessing.shared_memory -> Bug on multiprocessing.shared_memory |
2020-03-17 21:43:19 | dxflores | set | messages: + msg364481 |
2020-03-16 18:48:32 | jfischer | set | nosy:
+ jfischer messages: + msg364351 |
2020-03-14 05:45:17 | rhettinger | set | nosy:
+ pitrou, davin |
2020-03-13 19:43:03 | dxflores | create |