Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared memory construct to avoid need for serialization between processes #79994

Open
applio opened this issue Jan 24, 2019 · 59 comments
Open

shared memory construct to avoid need for serialization between processes #79994

applio opened this issue Jan 24, 2019 · 59 comments
Assignees
Labels
3.8 only security fixes pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir topic-multiprocessing type-feature A feature request or enhancement

Comments

@applio
Copy link
Member

applio commented Jan 24, 2019

BPO 35813
Nosy @brettcannon, @nascheme, @rhettinger, @terryjreedy, @ronaldoussoren, @pitrou, @osvenskan, @giampaolo, @skrah, @pmp-p, @ambv, @ericsnowcurrently, @1st1, @applio
PRs
  • bpo-35813: Added shared_memory submodule of multiprocessing. #11664
  • bpo-35813: Added shared_memory submodule of multiprocessing. #11664
  • bpo-35813: Added shared_memory submodule of multiprocessing. #11664
  • bpo-35813: Tests and docs for shared_memory #11816
  • bpo-35813: Tests and docs for shared_memory #11816
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/applio'
    closed_at = None
    created_at = <Date 2019-01-24.04:02:05.382>
    labels = ['3.8', 'type-feature', 'library']
    title = 'shared memory construct to avoid need for serialization between processes'
    updated_at = <Date 2019-12-10.07:52:25.564>
    user = 'https://github.com/applio'

    bugs.python.org fields:

    activity = <Date 2019-12-10.07:52:25.564>
    actor = 'xdegaye'
    assignee = 'davin'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-01-24.04:02:05.382>
    creator = 'davin'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35813
    keywords = ['patch', 'patch', 'patch']
    message_count = 57.0
    messages = ['334278', '334737', '334789', '334797', '334798', '334799', '334805', '334806', '334809', '334817', '334820', '334821', '334822', '334835', '334934', '334945', '334957', '334963', '335030', '335032', '335194', '335195', '335262', '335273', '335282', '335284', '335360', '335362', '335366', '335622', '335660', '335682', '335689', '335694', '335696', '335697', '335700', '335711', '335715', '335719', '335731', '335789', '335797', '336182', '336294', '336298', '336336', '336383', '336386', '336388', '336395', '336400', '336423', '336429', '336430', '338180', '341920']
    nosy_count = 15.0
    nosy_names = ['brett.cannon', 'nascheme', 'rhettinger', 'terry.reedy', 'ronaldoussoren', 'pitrou', 'osvenskan', 'giampaolo.rodola', 'skrah', 'pmpp', 'lukasz.langa', 'eric.snow', 'Pam.McANulty', 'yselivanov', 'davin']
    pr_nums = ['11664', '11664', '11664', '11816', '11816']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue35813'
    versions = ['Python 3.8']

    @applio
    Copy link
    Member Author

    applio commented Jan 24, 2019

    A facility for using shared memory would permit direct, zero-copy access to data across distinct processes (especially when created via multiprocessing) without the need for serialization, thus eliminating the primary performance bottleneck in the most common use cases for multiprocessing.

    Currently, multiprocessing communicates data from one process to another by first serializing it (by default via pickle) on the sender's end then de-serializing it on the receiver's end. Because distinct processes possess their own process memory space, no data in memory is common across processes and thus any information to be shared must be communicated over a socket/pipe/other mechanism. Serialization via tools like pickle is convenient especially when supporting processes on physically distinct hardware with potentially different architectures (which multiprocessing does also support). Such serialization is wasteful and potentially unnecessary when multiple multiprocessing.Process instances are running on the same machine. The cost of this serialization is believed to be a non-trivial drag on performance when using multiprocessing on multi-core and/or SMP machines.

    While not a new concept (System V Shared Memory has been around for quite some time), the proliferation of support for shared memory segments on modern operating systems (Windows, Linux, *BSDs, and more) provides a means for exposing a consistent interface and api to a shared memory construct usable across platforms despite technical differences in the underlying implementation details of POSIX shared memory versus Native Shared Memory (Windows).

    For further reading/reference: Tools such as the posix_ipc module have provided fairly mature apis around POSIX shared memory and seen use in other projects. The "shared-array", "shared_ndarray", and "sharedmem-numpy" packages all have interesting implementations for exposing NumPy arrays via shared memory segments. PostgreSQL has a consistent internal API for offering shared memory across Windows/Unix platforms based on System V, enabling use on NetBSD/OpenBSD before those platforms supported POSIX shared memory.

    At least initially, objects which support the buffer protocol can be most readily shared across processes via shared memory. From a design standpoint, the use of a Manager instance is likely recommended to enforce access rules in different processes via proxy objects as well as cleanup of shared memory segments once an object is no longer referenced. The documentation around multiprocessing's existing sharedctypes submodule (which uses a single memory segment through the heap submodule with its own memory management implementation to "malloc" space for allowed ctypes and then "free" that space when no longer used, recycling it for use again from the shared memory segment) will need to be updated to avoid confusion over concepts.

    Ultimately, the primary motivation is to provide a path for better parallel execution performance by eliminating the need to transmit data between distinct processes on a single system (not for use in distributed memory architectures). Secondary use cases have been suggested including a means for sharing data across concurrent Python interactive shells, potential use with subinterpreters, and other traditional uses for shared memory since the first introduction of System V Shared Memory onwards.

    @applio applio added the 3.8 only security fixes label Jan 24, 2019
    @applio applio self-assigned this Jan 24, 2019
    @applio applio added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 24, 2019
    @applio
    Copy link
    Member Author

    applio commented Feb 2, 2019

    New changeset e5ef45b by Davin Potts in branch 'master':
    bpo-35813: Added shared_memory submodule of multiprocessing. (bpo-11664)
    e5ef45b

    @ronaldoussoren
    Copy link
    Contributor

    Was this merged too soon? This is new functionality without any docs and tests.

    I've also left review comments on the pull request.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 3, 2019

    Yes, I think this was merged too soon. Davin, the convention is that we only merge completed work, not some work-in-progress.

    You already did this once in https://bugs.python.org/issue28053 . You still haven't completed that piece of work.

    I'm going to ask for this to be reverted.

    @ronaldoussoren
    Copy link
    Contributor

    FWIW I agree with reverting this pull request, the work is clearly not finished yet.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 3, 2019

    And to make things clear, I'm not saying anything about the functionality. Easier shared memory + multiprocessing is definitely an interesting endeavour (and a non-trivial one).

    @terryjreedy
    Copy link
    Member

    I think there is also a license problem. posixshmem.c contains "Copyright 2012 Philip Semanchuk, 2018-2019 Davin Potts"

    Ronald commented "The only other files with a copyright attribute are parser, optparse and platform. I'd prefer to avoid adding new copyright attributes to stdlib modules."

    (turtle.py has a problematic copyright notice in the docstring.)

    I think we definitely should not add new copyright notices.

    Copyright notices are redundant: all contributors retain copyright in their contributions.

    Copyright notices are deceptive: 1. since they are rare, they imply that there is something special about a particular module and the listed authors; 2. even if a module were special, the notice becomes obsolete as soon as anyone else contributes to the module.

    Copyright notices are not needed: contributors authorize PSF to distribute the collective work under a rather liberal license. If anyone want to make a use of Python code not covered by that license, and wants to bypass PSF, they would have to look at git log and and git blame to find the relevant contributors.

    In this case, part of the work is attributed to Philip Semanchuk as a current copyright owner. According to https://bugs.python.org/user2567, he has not signed the contributor agreement, so his work should not have been merged until he has. Even if he had, he would have to specifically agree to his work being contributed. Sorry to be a grinch, but aside from anything else, I think this should be reverted until the legal question is clarified.

    Some of this might need discussion on pydev.

    @applio
    Copy link
    Member Author

    applio commented Feb 4, 2019

    This work is the result of ~1.5 years of development effort, much of it accomplished at the last two core dev sprints. The code behind it has been stable since September 2018 and tested as an independently installable package by multiple people.

    I was encouraged by Lukasz, Yury, and others to check in this code early, not waiting for tests and docs, in order to both solicit more feedback and provide for broader testing. I understand that doing such a thing is not at all a novelty. Thankfully it is doing that -- I hope that feedback remains constructive and supportive.

    There are some tests to be found in a branch (enh-tests-shmem) of github.com/applio/cpython which I think should become more comprehensive before inclusion. Temporarily deferring and not including them as part of the first alpha should reduce the complexity of that release.

    Regarding the BSD license on the C code being adopted, my conversations with Brett and subsequently Van have not raised concerns, far from it -- there is a process which is being followed to the letter. If there are other reasons to object to the thoughtful adoption of code licensed like this one, that deserves a decoupled and larger discussion first.

    @ronaldoussoren
    Copy link
    Contributor

    David,

    I don't agree with merging at this point.

    The lack of documentation makes it hard to ask for more feedback and broader testing, because people won't know how to use the library and provide constructive feedback (that's why my review comments on GitHub are mostly nitpicking).

    Furthermore, distributing this as a package on PyPI is much better way to solicit further testing and feedback, because then it can be used right now instead of having to wait for a release. Releasing on PyPI is feasible for this new functionality because it is a standalone library.

    It would be better to revert this changeset until it is much closer to finished and there has been time for better code review.

    BTW. My comment about copyright on the pull request was about the __copyright__ attribute of the C extension, that is rarely used in the stdlib and is IMHO something to avoid.

    @ambv
    Copy link
    Contributor

    ambv commented Feb 4, 2019

    @davin, in what time can you fill in the missing tests and documentation? If this is something you finish do before alpha2, I'm inclined to leave the change in.

    As it stands, I missed the controversy yesterday as I was busy making my first release. So the merge *got released* in alpha1. I would prefer to fix the missing pieces forward instead of reverting and re-submitting which will only thrash blame and history at this point.

    FTR, I do agree with Antoine, Ronald and others that in the future such big changes should be as close to their ready state at merge time.

    @applio
    Copy link
    Member Author

    applio commented Feb 4, 2019

    @lukasz.langa: Missing tests and documentation will be in by alpha2.

    @ronaldoussoren
    Copy link
    Contributor

    Please also address my review comments.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 4, 2019

    Also, I will want to review this as well. Since Alpha 2 is scheduled for 2019-02-24, and this is a sizable piece of work, I would prefer if a full PR was available, say, before 2019-02-15.

    @ericsnowcurrently
    Copy link
    Member

    FTR, I've had a number of extensive conversations with Davin (over the last year and a half) about this feature. His choices seemed reasonable (caveat: I'm not an expert on multiprocessing) and the feature seemed desirable. I did not review his PR.

    @osvenskan
    Copy link
    Mannequin

    osvenskan mannequin commented Feb 6, 2019

    Hi all, I'm the author of posix_ipc on which some of this code is based. I'd be happy to sign a contributor agreement in order to erase any concerns on that front.

    @terryjreedy
    Copy link
    Member

    I would prefer that we be consistent. In any case, I think you should be added to Misc/ACKS in the PR.

    @giampaolo
    Copy link
    Contributor

    Unit-tests at https://bugs.python.org/issue35917.

    @giampaolo
    Copy link
    Contributor

    Also, for completeness (since discussion is getting split), please see my proposal to move SharedMemoryManager and SharedMemoryServer into multiprocessing.managers namespace and rename shared_memory.py to _shared_memory.py:
    https://mail.python.org/pipermail/python-dev/2019-February/156235.html
    Also, it appears ShareableList should be register()ed against SharedMemoryManager.

    @nascheme
    Copy link
    Member

    nascheme commented Feb 7, 2019

    I didn't finish reviewing completely yet but here are some comments.

    I think the random filename generation can be pulled out of
    _posixshmem. Make it require that a filename is passed into it.
    That moves a fair bit of complexity out of C and into Python. In
    the Python module, I suggest that you could use secrets.token_hex()
    to build the filename. Put the loop that retries because of name
    collisions in the Python module as well.

    If you like, I can try to make a patch that does the above.

    When looking at at how the Python code would handle a name collision,
    I see this code:

    + switch (errno) {
    + case EACCES:
    + PyErr_Format(pPermissionsException,
    + "No permission to %s this segment",
    + (flags & O_TRUNC) ? "truncate" : "access"
    + );
    + break;
    +
    + case EEXIST:
    + PyErr_SetString(pExistentialException,
    + "Shared memory with the specified name already exists");
    + break;
    +
    + case ENOENT:
    + PyErr_SetString(pExistentialException,
    + "No shared memory exists with the specified name");
    + break;
    +
    + case EINVAL:
    + PyErr_SetString(PyExc_ValueError, "Invalid parameter(s)");
    + break;
    +
    + case EMFILE:
    + PyErr_SetString(PyExc_OSError,
    + "This process already has the maximum number of files open");
    + break;
    +
    + case ENFILE:
    + PyErr_SetString(PyExc_OSError,
    + "The system limit on the total number of open files has been reached");
    + break;
    +
    + case ENAMETOOLONG:
    + PyErr_SetString(PyExc_ValueError,
    + "The name is too long");
    + break;
    +
    + default:
    + PyErr_SetFromErrno(PyExc_OSError);
    + break;
    + }

    I think it might be cleaner just to make it:

            PyErr_SetFromErrno(PyExc_OSError);

    Then, if you need a customized exception or message, you can
    re-raise inside the Python code. To me, it seems simpler and more
    direct to just preserve the errno and always raise OSError. Changing things
    to ValueError means that callers need to look at the message text to
    differentiate between some errno values.

    Is it the case that _posixshmem started life as a module that would
    be used directly and not something hidden by another layer of
    abstraction? If so, having these customized exceptions and having
    it do the filename generation itself makes sense. However, it is an
    internal implementation detail of shared_memory.py, I think it
    should be simplified to do only what is needed. E.g. a thin layer
    of the system calls.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 7, 2019

    (also, OSError will automatically convert to more specific subclasses for some errnos, see PEP-3151:

    >>> raise OSError(errno.ENOENT, "foobar")                                                                                                                              
    Traceback (most recent call last):
      File "<ipython-input-3-e08f9c9a179c>", line 1, in <module>
        raise OSError(errno.ENOENT, "foobar")
    FileNotFoundError: [Errno 2] foobar

    )

    @applio
    Copy link
    Member Author

    applio commented Feb 11, 2019

    Docs and tests are now available in a new PR. I have stayed focused on getting these docs and tests to everyone without delay but that means I have not yet had an opportunity to respond to the helpful comments, thoughtful questions, and threads that have popped up in the last few days. I will follow up with all comments as quickly as possible starting in the morning.

    There are two topics in particular that I hope will trigger a wider discussion: the api around the SharedMemory class and the inclusion-worthiness of the shareable_wrap function.

    Regarding the api of SharedMemory, the docs explain that not all of the current input parameters are supportable/enforceable across platforms. I believe we want an api that is relevant across all platforms but at the same time we do not want to unnecessarily suppress/hide functionality that would be useful on some platforms -- there needs to be a balance between these motivations but where do we strike that balance?

    Regarding the inclusion-worthiness of the shareable_wrap function, I deliberately did not include it in the docs but its docstring in the code explains its purpose. If included, it would drastically simplify working with NumPy arrays; please see the code example in the docs demonstrating the use of NumPy arrays without the aid of the shareable_wrap function. I have received feedback from others using this function also worth discussing.

    Thank you to everyone who has already looked at the code and shared helpful thoughts -- please have a look at the tests and docs.

    @applio
    Copy link
    Member Author

    applio commented Feb 11, 2019

    @giampaolo.rodola: Your patch from 3 days ago in bpo-35917 included additional tests around the SharedMemoryManager which are now causing test failures in my new PR. This is my fault because I altered SharedMemoryManager to no longer support functionality from SyncManager that I thought could be confusing to include. I am just now discovering this and am not immediately sure if simply removing the SharedMemoryManager-relevant lines from your patch is the right solution but I wanted to mention this thought right away.

    Thank you for discovering that SyncManager was being overlooked in the tests and the nice patch in bpo-35917.

    @applio
    Copy link
    Member Author

    applio commented Feb 11, 2019

    @terry.reedy and @ronaldoussoren: I have asked Van again to provide comments here clarifying the topics of (1) copyright notices and (2) requiring the BSD-licensed-work's author to sign a contributor agreement.

    Specifically regarding the appearance of __copyright__, I added my agreement to your comments on #56025 on this.

    @giampaolo
    Copy link
    Contributor

    I submitted an initial review / comments in the PR. I think this is good for a first iteration in order to understand what APIs to expose publicly (also, keep in mind I may not have a full picture of how this is intended to be used precisely).

    As for the doc, I find it a bit too verbose. If the namespace change is accepted consider doing the following:

    • document SharedMemoryManager right after SyncManager. That would let you reuse the description for Semaphore, Barrier, Lock, etc.
    • within SharedMemoryManager doc link/point to a different section of the doc where you explain the whole thing in more details, and also document the remaining APIs

    Just an idea. Writing good doc is not easy and it requires actually putting hands on it.

    Hope this helps.

    @rhettinger
    Copy link
    Contributor

    FWIW, I read the draft docs and found them to be at the right level throughout. I definitely wouldn't want anything more terse.

    @applio
    Copy link
    Member Author

    applio commented Feb 12, 2019

    @giampaolo.rodola: It definitely helps.

    Conceptually, SyncManager provides "distributed shared memory" where lists, dicts, etc. are held in memory by one process but may be accessed remotely from another via a Proxy Object. Mutating a dict from one process requires sending a message to some other process to request the change be made.

    In contrast, SharedMemoryManager provides non-distributed shared memory where a special region of memory is held by the OS kernel (not a process) and made directly addressable to many processes simultaneously. Modifying any data in this special region of memory requires zero process-to-process communication; any of the processes may modify the data directly.

    In a speed contest, the SharedMemoryManager wins in every use case -- and it is not a close race. There are other advantages and disadvantages to each, but speed is the key differentiator.

    Thinking ahead to the future of SharedMemoryManager, there is the potential for a POSIX shared memory based semaphore. The performance of this semaphore across processes should drastically outperform SyncManager's semaphore. It might be something we will want to support in the future. SharedMemoryManager needs a synchronization mechanism now (in support of common use cases) to coordinate across processes, which is why I initially thought SharedMemoryManager should expose the Lock, Semaphore, Event, Barrier, etc. powered by distributed shared memory. I am no longer sure this is the right choice for three reasons:
    (1) it unnecessarily complicates and confuses the separation of what is powered by fast SystemV-style shared memory and what is powered by slow distributed shared memory,
    (2) it would be a very simple example in the docs to show how to add our existing Lock or Semaphore to SharedMemoryManager via register(),
    (3) if we one day implement POSIX shared memory semaphores (and equivalent where POSIX is not supported), we will have the burden of an existing lock/semaphore creation methods and apis with behavioral differences.

    I propose that it would be clearer but no less usable if we drop these registered object types (created via calls to register()) from SharedMemoryManager. It is one line of code for a user to add "Lock" to SharedMemoryManager, which I think we can demonstrate well with a simple example.

    @giampaolo
    Copy link
    Contributor

    Because shared memory blocks are not "owned" by a single process...
    [...]
    I propose a simpler, consistent-across-platforms API like:
    SharedMemory(name=None, create=False, size=0)

    Maybe something like this instead?

          SharedMemory(name=None, attach_if_exists=False, size=0)

    The use case I'm thinking about is 2 distinct processes/apps which agree on a common fixed name.

    @giampaolo
    Copy link
    Contributor

    Hopefully my last iteration: =)

    1. As for SharedMemoryManager, I believe it should live in multiprocessing.managers, not shared_memory.py. It's a subclass of Manager and behaves like a manager (start(), shutdown(), get_server(), etc.), so IMO that's its natural place.

    2. Same for SharedMemoryServer (which is a subclass of multiprocessing.managers.Server). SharedMemoryTracker appears to be just a support class for it, so IMO it should either be private or not documented.

    3. ShareableList name is kinda inconsistent with other classes (they all have a "Shared" prefix). I'd call it SharedList instead.

    4. Ideally the (public) APIs I have in mind are:

      multiprocessing.managers.SharedMemoryManager
      multiprocessing.managers._SharedMemoryTracker
      multiprocessing.managers.SharedMemoryServer (not documented)
      multiprocessing.shared_memory.SharedMemory
      multiprocessing.shared_memory.SharedList
      multiprocessing.shared_memory.WindowsNamedSharedMemory (maybe)
      multiprocessing.shared_memory.PosixSharedMemory (maybe)

    AFAIU there are 2 distinct use-cases at play:

    • two separate apps agreeing on a specific fixed memory *name* to attach to, which will use SharedMemory/List directly
    • one single app using a manager and a worker, which will use SharedMemoryManager, and get SharedMemory/List directly from it (see https://github.com/python/cpython/pull/11816/files#r257458137)
      IMO the 2 different namespaces would reflect and enforce this separation of use cases.
    1. I have some reservations about SharedMemory's "flags" and "mode" args.
    • flags (O_CREAT, O_EXCL, O_CREX, O_TRUNC, O_RDONLY): it seems it may conflict with "read_only" arg. I wonder if we could achieve the same goal with more high-level named args instead (e.g. "create" / "attach_if_exists"). If in doubt, I would recommend to simply drop it for now.
    • mode: same considerations as above. Doc says "Its specification is not enforceable on all platforms" which makes me think it's probably better to drop it (also not sure why it defaults to 384).

    Hope this helps.

    @applio
    Copy link
    Member Author

    applio commented Feb 16, 2019

    @giampaolo:

    1. it seems SharedMemory.close() does not destroy the memory region
      (I'm able to re-attach to it via name). If I'm not mistaken only
      the manager can do that.

    Correct, close() does not and should not destroy the memory region because other processes may still be using it. Only a call to unlink() triggers the destruction of the memory region and so unlink() should only be called once across all the processes with access to that shared memory block.

    The unlink() method is available on the SharedMemory class. No manager is required. This is also captured in the docs.

    1. I suggest to turn SharedMemory.buf in a read-onl property

    Good idea! I will make this change today, updating #56025.

    1. it seems "size" kwarg cannot be zero (current default)

    From the docs:
    When attaching to an existing shared memory block, set to 0 (which is the default).

    This permits attaching to an existing shared memory block by name without needing to also already know its size.

    1. I wonder if we should have multiprocessing.active_memory_children() or something

    I also think this would be helpful but...

    I'm not sure if active_memory_children() can return meaningful results with a brand new process (I suppose it can't).

    You are right. As an aside, I think it interesting that in the implementation of "System V Shared Memory", its specification called for something like a system-wide registry where all still-allocated shared memory blocks were listed. Despite the substantial influence System V Shared Memory had on the more modern implementations of "POSIX Shared Memory" and Windows' "Named Shared Memory", neither chose to make it part of their specification.

    By encouraging the use of SharedMemoryManager to track and ensure cleanup, we are providing a reliable and cross-platform supportable best practice. If something more low-level is needed by a user, they can choose to manage cleanup themselves. This seems to parallel how we might encourage, "when opening a file, always use a with statement", yet users can still choose to call open() and later close() when they wish.

    @applio
    Copy link
    Member Author

    applio commented Feb 16, 2019

    @giampaolo:

    Maybe something like this instead?
    SharedMemory(name=None, attach_if_exists=False, size=0)

    I think this misses the use case when wanting to ensure we only attach to an existing shared memory block and if it does not exist, we should raise an exception because we can not continue. (If the shared memory block should already be there but it is not, this means something bad happened earlier and we might not know how to recover here.)

    I believe the two dominant use cases to address are:

    1. I want to create a shared memory block (either with or without a pre-conceived name).
    2. I want to attach to an existing shared memory block by its unique name.

    @applio
    Copy link
    Member Author

    applio commented Feb 16, 2019

    @giampaolo:

    1. As for SharedMemoryManager, I believe it should live in multiprocessing.managers, not shared_memory.py.

    I am happy to see it live in multiprocessing.managers so long as we can provide a clean way of handling what happens on a platform where we can not support shared memory blocks. (We have implementations for PosixSharedMemory and NamedSharedMemory which together cover Windows, Linux, MacOS, the *BSDs, and possibly others but that does not cover everything.)

    @neil has already raised this question of what do we want the behavior to be on these unsupported platforms on import? If everything dependent upon shared memory blocks remains inside shared_memory.py, then we could raise a ModuleNotFoundError or ImportError or similar when attempting to import shared_memory. If we move SharedMemoryManager to live in multiprocessing.managers, we need to decide how to handle (and communicate to the user appropriately) its potential absence. So far, I am unable to find a good example of another module where they have chosen to split up such code rather than keeping it all bottled up inside a single module, but perhaps I have missed something?

    1. Same for SharedMemoryServer (which is a subclass of multiprocessing.managers.Server).

    Same thing as above. If we decide how to handle the unsupported platforms on import, we can re-organize appropriately.

    1. ShareableList name is kinda inconsistent with other classes (they all have a "Shared" prefix). I'd call it SharedList instead.

    Oooh, interesting. I am happy to see a name change here. To share how I came up with its current name: I had thought to deliberately break the naming pattern here to make it stand out. The others, SharedMemory, SharedMemoryManager, and SharedMemoryServer, are all focused on the shared memory block itself which is something of a more primitive concept (like accessing SharedMemory.buf as a memoryview) compared to working with something like a list (a less primitive, more widely familiar concept). Likewise, I thought a dict backed by shared memory might be called a ShareableDict and other things like a NumPy array backed by shared memory might be called a ShareableNDArray or similar. I was hoping to find a different pattern for the names of these objects-backed-by-shared-memory-blocks, but I am uncertain I found the best name.

    1. I have some reservations about SharedMemory's "flags" and "mode" args.

    It sounds like you are agreeing with what I advocated in msg335660 (up above). Great!

    @giampaolo
    Copy link
    Contributor

    The unlink() method is available on the SharedMemory class. No manager is required. This is also captured in the docs.

    I missed that, sorry.

    > 3) it seems "size" kwarg cannot be zero (current default)
    From the docs: When attaching to an existing shared memory block, set to 0

    Mmm... this makes the API a bit weird because it's a arg which is optional only for existing memory object, not new ones. I see that internally *size* is passed to ftruncate(). Can't you just avoid calling ftruncate() if size is not passed (None)? Also, what happens if you alter the size of an existing object with a smaller value? Is the memory region overwritten?

        a = shared_memory.SharedMemory(None, size=10)
        b = shared_memory.SharedMemory(a.name, size=4)

    > Maybe something like this instead?
    > SharedMemory(name=None, attach_if_exists=False, size=0)
    I think this misses the use case when wanting to ensure we only attach to an existing shared memory block and if it does not exist, we should raise an exception because we can not continue.

    It appears this is already covered:

        >>> SharedMemory(name="non-existent")
        Traceback (most recent call last):
        ...
        _posixshmem.ExistentialError: No shared memory exists with the specified name

    I believe the two dominant use cases to address are:

    1. I want to create a shared memory block (either with or without a pre-conceived name).
    2. I want to attach to an existing shared memory block by its unique name.

    Don't you also want to "create if it doesn't exist, else attach" as a single, atomic operation? It seems to me the only way to achieve that would be this, but it's racy:

    try:
        m = shared_memory.SharedMemory("foo", size=10)  # create
    except ExistentialError:
        m = shared_memory.SharedMemory("foo")  # attach (race condition)
    

    Note that here I'm entering into an unknown territory, because I'm not sure if there are or should be sync primitives to "wait for another memory to join me" etc. Just mentioning it for the sake of mental lucubration. =)

    If we move SharedMemoryManager to live in multiprocessing.managers, we need to decide how to handle (and communicate to the user appropriately) its potential absence.

    How about:

    # managers.py
    try:
        from .shared_memory import SharedMemory, SharedList 
    except ImportError:
        pass
    else:
        class SharedMemoryManager: ...
        class SharedMemoryServer: ...
        del SharedMemory, SharedList
    

    > 4) I have some reservations about SharedMemory's "flags" and "mode" args.
    It sounds like you are agreeing with what I advocated in msg335660 (up above). Great!

    If you mean drop those 2 args entirely then it probably there's no point in exposing WindowsNamedSharedMemory and PosixSharedMemory . If you mean to keep them then it seems "read_only=True" would conflict with "flags=O_RDWR". The fact that the Windows counterpart uses different constants makes me think this bitwise stuff should probably be handled internally, and exposed via friendly named-args like "read_only" / "attach_if_exists" or similar (but it's still not clear what those named args should be exactly).

    @giampaolo
    Copy link
    Contributor

    As for *flags* argument "man shm_open" mentions 5 possible flags:

    • O_RDONLY, OR_RDWR: these can be served by existing *read_only* arg
    • O_CREAT, O_EXCL: dictates whether to raise error if file exists; could be served via *attach_if_exists* arg, if added, or internally if not added
    • O_TRUNC: related to *size* argument (it seems it should be set when size=0)

    As such I think "flags" should not be exposed. As for Windows, *read_only* is currently exposed but ignored, and no other flags are contemplated. It seems you can implement *read_only* by passing FILE_MAP_READ or FILE_MAP_WRITE to OpenFileMappingW() here:

    h_map = kernel32.OpenFileMappingW(FILE_MAP_READ, False, name)

    Not sure how to reliably check if a file mapping already exists, but possibly POSIX and Windows should behave the same in this regard (and some test cases to exercise all the possible combinations would be good to have).

    @applio
    Copy link
    Member Author

    applio commented Feb 16, 2019

    @giampaolo:

    Also, what happens if you alter the size of an existing object with a smaller value? Is the memory region overwritten?

    Attaching to an existing shared memory block with a size=N which is smaller than its allocated size (say it was created with size=M and N<M) will succeed in giving you access to at least the first N bytes of that shared memory block. The shared memory block allocated size will still be M bytes. That means future attempts to attach to it with size=M will continue to work as well. One motivation for why this is supported is, I believe, to deliberately limit access to part of the shared memory in some parts of a user's code, avoiding potential coding mistakes by design; for example, say the first 100KB contains unchanging reference data that many processes need to access but the second 100KB contains rapidly changing data that only a few processes should ever access or manipulate. I expect the most common use is to simply attach to the whole of a shared memory block, but I would not want to unnecessarily limit other established use cases.

    This behavior needed to be captured in the docs but I see it has not been! I have now added to the description of the size parameter and it should show up in #56025 shortly.

    Can't you just avoid calling ftruncate() if size is not passed (None)?

    It looks like it does skip calling ftruncate() if size is 0.  From posixshmem.c:
            if (size) {
                DPRINTF("calling ftruncate, fd = %d, size = %ld\n", self->fd, size);
                if (-1 == ftruncate(self->fd, (off_t)size)) {

    > I think this misses the ...
    It appears this is already covered:

    Sorry for any confusion; I was interpreting your proposed parameter name, attach_if_exists, in the following way:

    • attach_if_exists=True: If exists, attach to it otherwise create one
    • attach_if_exists=False: Create a new one but do not attach to an existing with the same name
      I did not see a way to indicate a desire to *only* attach without creation. I need a way to test to see if a shared memory block already exists or not without risk of creating one. At least this is how I was interpreting "attach if exists".

    Don't you also want to "create if it doesn't exist, else attach" as a single, atomic operation?

    Yes, I do! This was part of my description for the parameter named "create" in msg335660:
    When set to True, a new shared memory block will be created unless
    one already exists with the supplied unique name, in which case that block
    will be attached to and used.

    I'm not sure if there are or should be sync primitives to "wait for another memory to join me" etc.

    In the case of shared memory, I do not think so. I think such signaling between processes, when needed, can be accomplished by our existing signaling mechanisms (like, via the Proxy Objects for Event or Semaphore).

    @giampaolo
    Copy link
    Contributor

    It looks like it does skip calling ftruncate() if size is 0. From posixshmem.c:

    Let me rephrase: are we forced to specify a value (aka call ftruncate()) on create ? If we are as I think, could size have a reasonable default value instead of 0? Basically I'm wondering if we can relieve the common user from thinking about what size to use, mostly because it's sort of a low level detail. Could it perhaps default to mmap.PAGESIZE?

    @nascheme
    Copy link
    Member

    Some thoughts on this API.

    I think we need the "create with exclusive behavior" option, even though we don't know how to implement it on Windows right now. To me, there are two cases when calling SharedMemory:

    1. You want to create a new shared memory object. In this case, if your chosen name conflicts with an existing name, you should get an error. If you don't, you could be sharing data between unrelated processes. Chaos will ensue.

    2. You want to attach to an existing shared memory object. You have a name, passed to you by some means. In this case, you don't want to create a new object if it doesn't exist. You should get an error.

    I thought there might be a 3rd case where you have "co-equal" processes and any one of them could create and the others would attach. That dosen't seem safe though, if you can't ensure a unique name is used. You might attach to some unrelated shared memory object due to a name conflict. So, I think the API should not support this case.

    To support 1 & 2, we could just have 'create'. When true, it would act like O_CREX. When false, you would get an error if the name doesn't already exist.

    Regarding 'size', I think it is a bit weird how it currently works. Maybe 'size' should only be valid if you are creating a new shared memory object. If you are attaching to an existing one, size would be found from the existing object (if that's possible on all platforms). SharedMemory could grow a "resize()" method that lets you enlarge the underlying memory object. I don't know if that's useful in practice so maybe better to leave it out and keep the API simple.

    Should 'size' be a property that always does fstat() to find the size of the underlying file? Or, should it be the size passed in (for create=True) or the size of the file when the mmap is created. I'm not sure but maybe it is better to not always do the fstat().

    The 'read_only' keyword doesn't make sense if you care creating a new object and need to call ftruncate() on it. I think the OS will not allow that. Maybe 'read_only' should not be allowed if you are creating. Or, you should have a method that takes a read-write object and makes it read-only.

    On Linux at least, shm_open() is just a thin layer over open(). So the shared memory file is really just a regular file that is stored in /var/run/shm. If you realize that, it seems SharedMemory() is much like tempfile.NamedTemporaryFile(). E.g. if you create a NamedTemporaryFile under /var/run/shm it will behave like the file descriptor created by SharedMemory(). SharedMemory() objects have the difference that they don't clean themselves up. Also, they don't set the close-on-fork flag on the file descriptor. Maybe they should? It seems unclear to me how you should avoid cluttering /var/run/shm with shared memory objects that people forget to cleanup. I guess the plan is that people need to call unlink() at the right time. That seems error prone though.

    @applio
    Copy link
    Member Author

    applio commented Feb 18, 2019

    I think we need the "create with exclusive behavior" option, even
    though we don't know how to implement it on Windows right now.

    A fix to avoid the potential race condition on Windows is now part of #56025.

    To support 1 & 2, we could just have 'create'. When true, it would
    act like O_CREX. When false, you would get an error if the name
    doesn't already exist.

    I am good with this and now it can be supported.

    a 3rd case where you have "co-equal" processes and any one of them
    could create and the others would attach.

    There are some practical use cases motivating this. Rather than debate the merits of those use cases, given the concern raised, perhaps we should forego supporting this 3rd case for now.

    Regarding 'size', I think it is a bit weird how it currently works.
    Maybe 'size' should only be valid if you are creating a new shared
    memory object.

    This would avoid potential confusion in the details of how attempts to resize do/don't work on different platforms. I would prefer to not need to explain that on MacOS, requesting a smaller size is disallowed. This defers such issues until considering a "resize()" method as you suggest.

    I like this.

    Should 'size' be a property that always does fstat() to find the
    size of the underlying file?

    The potential exists for non-Python code to attach to these same shared memory blocks and alter their size via ftruncate() (only on certain Unix platforms).

    We could choose to not support such "external" changes and let size be a fixed value from the time of instantiation. But I would like to believe we can be more effective and safely use fstat() behind our reporting of 'size'.

    It seems unclear to me how you should avoid cluttering /var/run/shm
    with shared memory objects that people forget to cleanup.

    This is the primary purpose of the SharedMemoryManager. Admittedly, we will not convince everyone to use it when they should, just like we are not able to convince everyone to use NamedTemporaryFile for their temp files.

    To update the proposed change to the API:

    • We go with this simpler API: SharedMemory(name=None, create=False, size=0)
    • 'size' is ignored when create=False
    • create=True acts like O_CREX and create=False only attaches to existing shared memory blocks

    Remaining question: do PosixSharedMemory and WindowsNamedSharedMemory mirror this simplified API or do we expose the added functionality each offers, permitting informed users to use things like 'mode' when they know it is enforced on a particular platform?

    @applio
    Copy link
    Member Author

    applio commented Feb 21, 2019

    The simpler API is now implemented in #56025 as discussed previously. Notably:

    • We go with this simpler API: SharedMemory(name=None, create=False, size=0)
    • 'size' is ignored when create=False
    • create=True acts like O_CREX and create=False only attaches to existing shared memory blocks

    As part of this change, the PosixSharedMemory and WindowsNamedSharedMemory classes are no more; they have been consolidated into the SharedMemory class with a single, simpler, consistent-across-platforms API.

    On the SharedMemory class, 'size' is now stored by the __init__ and does not use fstat() as part of its property.

    Also, SharedMemoryManager (and its close friends) has been relocated to the multiprocessing.managers submodule, matching the organization @giampaolo outlined previously:
    multiprocessing.managers.SharedMemoryManager
    multiprocessing.managers._SharedMemoryTracker
    multiprocessing.managers.SharedMemoryServer (not documented)
    multiprocessing.shared_memory.SharedMemory
    multiprocessing.shared_memory.SharedList
    multiprocessing.shared_memory.WindowsNamedSharedMemory (REMOVED)
    multiprocessing.shared_memory.PosixSharedMemory (REMOVED)

    I believe this addresses all of the significant discussion topics in a way that brings together all of the excellent points being made. Apologies if I have missed something -- I did not think so but I will go back through all of the discussions tomorrow to double-check.

    @giampaolo
    Copy link
    Contributor

    Code looks much better now. I'm still not convinced "SharedMemory(name=None, create=False, size=0)" is the best API. How are you supposed to "create or attach" atomically? You can do that with O_EXCL but as it stands this is not togglable via the API.

    Also, could you address my comment about size?
    https://bugs.python.org/issue35813#msg335731

    @giampaolo
    Copy link
    Contributor

    Also, there is no way to delete/unwrap memory without using an existing SharedMemory instance, which is something we may not have on startup. Perhaps we should have a "shared_memory.unlink(name)" function similar to os.unlink() which simply calls C shm_unlink().

    @applio
    Copy link
    Member Author

    applio commented Feb 22, 2019

    Code looks much better now. I'm still not convinced
    "SharedMemory(name=None, create=False, size=0)" is the best API.
    How are you supposed to "create or attach" atomically?

    We are consciously choosing to not support an atomic "create or attach". This significantly simplifies the API and avoids the valid concerns raised around user confusion relating to that behavior (including the use of different specified 'size' values in a race) but does not preclude our potentially introducing this as a feature in the future.

    This simpler API still supports a "try: create; except: attach" which is not atomic but effectively covers the primary use cases for "create or attach". Combined with a SyncManager.Lock, users can already achieve an atomic "create or attach" using this simpler API.

    Also, could you address my comment about size?
    https://bugs.python.org/issue35813#msg335731
    > Let me rephrase: are we forced to specify a value (aka call
    > ftruncate()) on create ? If we are as I think, could size have a
    > reasonable default value instead of 0? Basically I'm wondering if we
    > can relieve the common user from thinking about what size to use,
    > mostly because it's sort of a low level detail. Could it perhaps
    > default to mmap.PAGESIZE?

    Apologies for not responding to your question already, Giampaolo.

    For the same reasons that (in C) malloc does not provide a default size, I do not think we should attempt to provide a default here. Not all platforms allocate shared memory blocks in chunks of mmap.PAGESIZE, thus on some platforms we would unnecessarily over-allocate no matter what default size we might choose. I do not think we should expect users to know what mmap.PAGESIZE is on their system. I think it is important that if a user requests a new allocation of memory, that they first consider how much memory will be needed. When attaching to an existing shared memory block, its size is already defined.

    I think this even fits with CPython's over-allocation strategies behind things like list, where an empty list triggers no malloc at all. We will not allocate memory until the user tells us how much to allocate.

    Also, there is no way to delete/unwrap memory without using an
    existing SharedMemory instance, which is something we may not have
    on startup. Perhaps we should have a "shared_memory.unlink(name)"
    function similar to os.unlink() which simply calls C shm_unlink().

    It is not really possible to offer this on non-POSIX platforms so I think we should not attempt to offer a public "shared_memory.unlink(name)". It is possible to invoke "shm_unlink" with the name of a shared memory block (for those who really need it) on platforms with POSIX Shared Memory support via:
    shared_memory._posixshmem.shm_unlink('name')

    @giampaolo
    Copy link
    Contributor

    We are consciously choosing to not support an atomic "create or attach". This significantly simplifies the API and avoids the valid concerns raised around user confusion relating to that behavior (including the use of different specified 'size' values in a race) but does not preclude our potentially introducing this as a feature in the future.

    I understand that because of *size* we cannot solve the race condition issue unless the user uses some sort of synchronization mechanism. FWIW I bumped into this lib:
    http://semanchuk.com/philip/sysv_ipc/
    ...which provides two separate APIs to "create" and "attach":

    >> SharedMemory("name", IPC_CREX)
    >> attach("name")

    At this point I'm agnostic about the API, which is probably just a matter of personal taste (e.g. one may prefer a separate SharedMemory.attach() classmethod or a *mode* argument accepting "x" and "a"). I see that that lib use shmat() on attach and shmdt() on detach. I'm not sure if that makes a difference, just mentioning it because your implementation doesn't do that on close() and perhaps it should.

    Combined with a SyncManager.Lock, users can already achieve an atomic "create or attach" using this simpler API.

    That assumes a single app/process which spawns a child (the "worker"). In that case SyncManager.Lock/others is indeed compatible with SharedMemory and can be used to implement non-racy "create or attach" and also to synchronize memory access on read and write. But AFAICT there are two uses cases for this functionality, and there currently is no mechanism to do any that if you have two unrelated apps/processes relying on a common shared memory *name* and *size*. I'm taking a peek at "man smh_overview" which says:

    <<Typically, processes must synchronize their access to a shared memory object, using, for example, POSIX semaphores. System V shared memory (shmget(2), shmop(2), etc.) is an older shared memory API. POSIX shared memory provides a simpler, and better designed interface; on the other hand POSIX shared memory is somewhat less widely available (especially on older systems) than System V shared memory.>>

    That would translate into a new Semaphore(name=None, create=False) class which (possibly?) would also provide better performances compared to SyncManager.Semaphore. Not sure if we can do the same on Windows though.

    Extra 1: apparently there are also POSIX msgget(), msgrcv() and msgsnd() syscalls which could be used to implement a System-V message Queue similar to SyncManager.Queue later on.

    Extra 2: given the 2 distinct use-cases I wonder if the low-level component (shared_memory.py) really belongs to multiprocessing module. Perhaps this should be provided as a separate "sharedmemory" module with multiprocessing.managers.SharedMemoryMemory being the high-level interface.

    @osvenskan
    Copy link
    Mannequin

    osvenskan mannequin commented Feb 23, 2019

    On Feb 23, 2019, at 10:40 AM, Giampaolo Rodola' <report@bugs.python.org> wrote:

    Giampaolo Rodola' <g.rodola@gmail.com> added the comment:

    > We are consciously choosing to not support an atomic "create or attach". This significantly simplifies the API and avoids the valid concerns raised around user confusion relating to that behavior (including the use of different specified 'size' values in a race) but does not preclude our potentially introducing this as a feature in the future.

    I understand that because of *size* we cannot solve the race condition issue unless the user uses some sort of synchronization mechanism. FWIW I bumped into this lib:
    http://semanchuk.com/philip/sysv_ipc/
    ...which provides two separate APIs to "create" and "attach":

    >>> SharedMemory("name", IPC_CREX)
    >>> attach("name")

    At this point I'm agnostic about the API, which is probably just a matter of personal taste (e.g. one may prefer a separate SharedMemory.attach() classmethod or a *mode* argument accepting "x" and "a"). I see that that lib use shmat() on attach and shmdt() on detach. I'm not sure if that makes a difference, just mentioning it because your implementation doesn't do that on close() and perhaps it should.

    attach() and detach() are particular to SysV IPC which is different from the POSIX IPC that’s being used here. There’s no need for attach() and detach() with POSIX shared memory.

    POSIX IPC is generally simpler than SysV IPC, in part because it was developed after SysV IPC so the developers had the benefit of experience with the older API.

    Side note: I’m the author of the sysv_ipc package you found, as well as the posix_ipc package.

    @applio
    Copy link
    Member Author

    applio commented Feb 23, 2019

    FWIW I bumped into this lib: http://semanchuk.com/philip/sysv_ipc/

    The author of that lib, Philip Semanchuk, is one of the people participating in this effort -- he has posted above in msg334934 here on b.p.o. and has helped review the PR in #56025.

    He is also the author of the posix_ipc package which was the original basis for our POSIX Shared Memory implementation here.

    The decision to base our Unix platform support upon POSIX and not SystemV libraries came after considerable research and there are important differences between the two. To oversimplify: POSIX Shared Memory support has now been available for some time on Linux, *BSD, MacOS, and others and is something of a successor to the SystemV.

    That assumes a single app/process which spawns a child (the "worker").

    Not true. A manager started by one process can be connected to by another process that is not a child. This is covered in the docs here: https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager That child can then request that shared memory blocks it creates be remotely tracked and managed by that remote process's manager. While I would not expect this to be a common use case, this is a feature of BaseManager that we inherit into SharedMemoryManager.

    The SyncManager.Lock can be used as part of this as well. Thus, two unrelated apps/processes *can* coordinate their management of shared memory blocks through the SharedMemoryManager.

    That would translate into a new Semaphore(name=None, create=False)
    class which (possibly?) would also provide better performances
    compared to SyncManager.Semaphore

    Right! You might have noticed that Philip has such a semaphore construct in his posix_ipc lib.

    I opted to not attempt to add this feature as part of this effort to both (1) keep focused on the core needs to work with shared memory, and (2) to take more time in the future to work out how to get cross-platform support for the semaphore right (as you point out, there are complications to work through).

    Extra 1: apparently there are also POSIX msgget(), msgrcv() and
    msgsnd() syscalls which could be used to implement a System-V message
    Queue similar to SyncManager.Queue later on.

    Right! This is also something Philip has in his posix_ipc lib. This should be part of the roadmap for what we do next with SharedMemory. This one may be complicated by the fact that not all platforms that implement POSIX Shared Memory chose to also implement these functions in the same way. We will need time to work out what we can or can not reasonably do here.

    Extra 2: given the 2 distinct use-cases I wonder if the low-level
    component (shared_memory.py) really belongs to multiprocessing module

    Given what I wrote above about how multiprocessing.managers does enable these use cases and the existing "distributed shared memory" support in multiprocessing, I think it logically belongs in multiprocessing. I suggest that "shm_open" and "shm_unlink" are our low-level tools, which appropriately are in _posixshmem, but SharedMemory and the rest are high-level tools; SharedMemoryManager will not be able to cover all life-cycle management use cases thus SharedMemory will be needed by many and in contrast, "shm_open" and "shm_unlink" will be needed only by those wishing to do something wacky. (Note: I am not trying to make "wacky" sound like a bad thing because wacky can be very cool sometimes.)

    Philip's ears should now be burning, I mentioned him so many times in this post. Ah! He beat me to it while I was writing this. Awesome!

    We would not be where we are with SharedMemory without his efforts over many years with his posix_ipc lib.

    @giampaolo
    Copy link
    Contributor

    Side note: I’m the author of the sysv_ipc package you found, as well as the posix_ipc package.

    Sorry, I didn't realize it was you.

    Not true. A manager started by one process can be connected to by another process that is not a child. This is covered in the docs here: https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager

    Ah nice! OK then. With this + Philip's explanation on why shmat()/shmdt() are not needed I guess I ran out of API-related complaints now. =)

    @rhettinger
    Copy link
    Contributor

    Any objections to getting this patch applied today so that it will be in the second alpha?

    Of course, further adjustments can still be made afterwards if needed.

    @giampaolo
    Copy link
    Contributor

    IMHO given the size of the change and how quickly this evolved I would probably feel safer to mark the API as experimental (no pun intended, this is great work), at least in alpha2. I believe we now have a good explanation in the docs but it still needs to be integrated better into the larger, existing multiprocessing docs (I will try to add some comments on the PR tomorrow).

    @applio
    Copy link
    Member Author

    applio commented Feb 24, 2019

    New changeset e895de3 by Davin Potts in branch 'master':
    bpo-35813: Tests and docs for shared_memory (bpo-11816)
    e895de3

    @applio
    Copy link
    Member Author

    applio commented Feb 24, 2019

    @giampaolo: The docstring in the shared_memory module currently marks the API as experimental. (You read my mind...)

    I will start a new PR where we can work on the better-integration-into-the-larger-multiprocessing-docs and add comments there.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Mar 18, 2019

    After changeset e895de3, test_all fails on platforms that lack the _posixshmem extension module (Android for example):

    ======================================================================
    FAIL: test_all (test.test___all__.AllTest) (module='multiprocessing.managers')
    ----------------------------------------------------------------------

    Traceback (most recent call last):
    File "/data/local/tmp/python/lib/python3.8/test/test___all__.py", line 34, in check_all
      exec("from %s import *" % modname, names)
    AttributeError: module 'multiprocessing.managers' has no attribute 'SharedMemoryManager'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
    File "/data/local/tmp/python/lib/python3.8/test/test___all__.py", line 37, in check_all
      self.fail("__all__ failure in {}: {}: {}".format(
    AssertionError: __all__ failure in multiprocessing.managers: AttributeError: module 'multiprocessing.managers' has no attribute 'SharedMemoryManager'

    The following patch fixes the problem:

    diff --git a/Lib/multiprocessing/managers.py b/Lib/multiprocessing/managers.py
    index 7973012b98..3fdd60fff7 100644
    --- a/Lib/multiprocessing/managers.py
    +++ b/Lib/multiprocessing/managers.py
    @@ -8,8 +8,7 @@
    # Licensed to PSF under a Contributor Agreement.

    -all = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token',

    •        'SharedMemoryManager' ]
      

    +all = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token' ]

    #
    # Imports
    @@ -33,6 +32,7 @@ from . import get_context
    try:
       from . import shared_memory
       HAS_SHMEM = True
    +    __all__.append('SharedMemoryManager')
    except ImportError:
       HAS_SHMEM = False

    @PamMcANulty
    Copy link
    Mannequin

    PamMcANulty mannequin commented May 8, 2019

    Reviewing docs and am writing a queue implementation for usability testing. I think ShareableList needs to support close() and unlink() directly. The .shm attribute should be considered an implementation detail, and not be exposed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @furkanonder
    Copy link
    Sponsor Contributor

    @applio Is there anything else to do for this ticket?

    @terryjreedy terryjreedy added the pending The issue will be closed if no feedback is provided label May 11, 2023
    @nascheme
    Copy link
    Member

    nascheme commented Aug 3, 2023

    It looks to me like this could be closed. The request for ShareableList.close() and ShareableList.unlink() could be extracted into a new issue. It's a reasonable enhancement request. Without digging deep into the code, it seems reasonable to me to consider .shm an internal detail.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir topic-multiprocessing type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests