Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to create multiprocessing shared arrays larger than 50% of memory size under linux #65315

Closed
mboquien mannequin opened this issue Mar 31, 2014 · 18 comments
Closed
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@mboquien
Copy link
Mannequin

mboquien mannequin commented Mar 31, 2014

BPO 21116
Nosy @pitrou, @serhiy-storchaka
Files
  • shared_array.diff
  • shared_array.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-04-13.18:55:22.032>
    created_at = <Date 2014-03-31.19:37:47.936>
    labels = ['library', 'performance']
    title = 'Failure to create multiprocessing shared arrays larger than 50% of memory size under linux'
    updated_at = <Date 2015-04-14.20:58:22.873>
    user = 'https://bugs.python.org/mboquien'

    bugs.python.org fields:

    activity = <Date 2015-04-14.20:58:22.873>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-04-13.18:55:22.032>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2014-03-31.19:37:47.936>
    creator = 'mboquien'
    dependencies = []
    files = ['34686', '34687']
    hgrepos = []
    issue_num = 21116
    keywords = ['patch']
    message_count = 18.0
    messages = ['215258', '215260', '215264', '215268', '215296', '215404', '215407', '215408', '215425', '215433', '215460', '215481', '215494', '215583', '240704', '240705', '240874', '241027']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'neologix', 'python-dev', 'sbt', 'serhiy.storchaka', 'mboquien']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue21116'
    versions = ['Python 3.5']

    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Mar 31, 2014

    It is currently impossible to create multiprocessing shared arrays larger than 50% of memory size under linux (and I assume other unices). A simple test case would be the following:

    from multiprocessing.sharedctypes import RawArray
    import ctypes
    
    foo = RawArray(ctypes.c_double, 10*1024**3//8)  # Allocate 10GB array

    If the array is larger than 50% of the total memory size, the process get SIGKILL'ed by the OS. Deactivate the swap for better effects.

    Naturally this requires that the tmpfs max size is large enough, which is the case here, 15GB max with 16GB of RAM.

    I have tracked down the problem to multiprocessing/heap.py. The guilty line is: f.write(b'\0'*size). Indeed, for very large sizes it is going to create a large intermediate array (10 GB in my test case) and as much memory is going to be allocated to the new shared array, leading to a memory consumption over the limit.

    To solve the problem, I have split the zeroing of the shared array into blocks of 1MB. I can now allocate arrays as large as the tmpfs maximum size. Also it runs a bit faster. On a test case of a 6GB RawArray, 3.4.0 takes a total time of 3.930s whereas it goes down to 3.061s with the attached patch.

    @mboquien mboquien mannequin added the stdlib Python modules in the Lib dir label Mar 31, 2014
    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Mar 31, 2014

    Updated the patch not to create a uselessly large array if the size is small than the block size.

    @pitrou pitrou added the performance Performance or resource usage label Mar 31, 2014
    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Mar 31, 2014

    New update of the patch following Antoine Pitrou's comments. PEP-8 does not complain anymore.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 31, 2014

    You overlooked the part where I was suggesting to add a unit test :-)
    Also, you'll have to sign a contributor's agreement at https://www.python.org/psf/contrib/contrib-form/

    Thanks!

    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Apr 1, 2014

    I have now signed the contributor's agreement.

    As for the unit test I was looking at it. However, I was wondering how to write a test that would have triggered the problem. It only shows up for very large arrays and it depends on occupied memory and the configuration of the temp dir. Or should I simply write a test creating for instance a 100 MB array and checking it has the right length?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 2, 2014

    Zero-filling mmap's backing file isn't really optimal: why not use truncate() instead? This way, it'll avoid completely I/O on filesystems that support sparse files, and should still work on FS that don't.

    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Apr 2, 2014

    If I remember correctly the problem is that some OS like linux (and probably others) do not really allocate space until something is written. If that's the case then the process may get killed later on when it writes something in the array.

    Here is a quick example:

    $ truncate -s 1T test.file
    $ ls -lh test.file 
    -rw-r--r-- 1 mederic users 1.0T Apr  2 23:10 test.file
    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1       110G   46G   59G  44% /home

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 2, 2014

    Using truncate() to zero extend is not really portable: it is only guaranteed on XSI-compliant POSIX systems.

    Also, the FreeBSD man page for mmap() has the following warning:

    WARNING! Extending a file with ftruncate(2), thus creating a big
    hole, and then filling the hole by modifying a shared mmap() can
    lead to severe file fragmentation. In order to avoid such
    fragmentation you should always pre-allocate the file's backing
    store by write()ing zero's into the newly extended area prior to
    modifying the area via your mmap(). The fragmentation problem is
    especially sensitive to MAP_NOSYNC pages, because pages may be
    flushed to disk in a totally random order.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 3, 2014

    If I remember correctly the problem is that some OS like linux (and
    probably others) do not really allocate space until something is written.
    If that's the case then the process may get killed later on when it writes
    something in the array.

    Yes, it's called overcommitting, and it's a good thing. It's exactly the
    same thing for memory: malloc() can return non-NULL, and the process will
    get killed when first writing to the page in case of memory pressure.

    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Apr 3, 2014

    "the process will get killed when first writing to the page in case of memory pressure."

    According to the documentation, the returned shared array is zeroed. https://docs.python.org/3.4/library/multiprocessing.html#module-multiprocessing.sharedctypes

    In that case because the entire array is written at allocation, the process is expected to get killed if allocating more memory than available. Unless I am misunderstanding something, which is entirely possible.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 3, 2014

    Also, the FreeBSD man page for mmap() has the following warning:

    That's mostly important for real file-backed mapping.
    In our case, we don't want a file-backed mmap: we expect the mapping to fit
    entirely in memory, so the writeback/read performance isn't that important
    to us.

    Using truncate() to zero extend is not really portable: it is only
    guaranteed on XSI-compliant POSIX systems.

    Now that's annoying.
    How about trying file.truncate() within a try block, and if an error is
    raised fallback to the zero-filling?

    Doing a lot of IO for an object which is supposed to be used for shared
    memory is sad.

    Or maybe it's time to add an API to access shared memory from Python (since
    that's really what we're trying to achieve here).

    According to the documentation, the returned shared array is zeroed.
    In that case because the entire array is written at allocation, the
    process is expected to get killed
    if allocating more memory than available. Unless I am misunderstanding
    something, which is entirely
    possible.

    Having the memory zero-filed doesn't require a write at all: when you do an
    anonymous memory mapping for let's say 1Gb, the kernel doesn't
    pre-emptively zero-fill it, it would be way to slow: usually it just sets
    up the process page table to make this area a COW of a single zero page:
    upon read, you'll read zeros, and upon write, it'll duplicate it as needed.

    The only reason the code currently zero-fills the file is to avoid the
    portability issues detailed by Richard.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 3, 2014

    Or maybe it's time to add an API to access shared memory from Python
    (since
    that's really what we're trying to achieve here).

    That sounds like a good idea. Especially since we now have the memoryview type.

    @mboquien
    Copy link
    Mannequin Author

    mboquien mannequin commented Apr 4, 2014

    Thanks for the explanations Charles-François. I guess the new API would not be before 3.5 at least. Is there still a chance to integrate my patch (or any other) to improve the situation for the 3.4 series though?

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Apr 5, 2014

    Indeed, I think it would make sense to consider this for 3.4, and even 2.7
    if we opt for a simple fix.

    As for the best way to fix it in the meantime, I'm fine with a buffered
    zero-filling (the mere fact that noone ever complained until now probably
    means that the performance isn't a show-stopper for users).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 13, 2015

    New changeset 0f944e424d67 by Antoine Pitrou in branch 'default':
    Issue bpo-21116: Avoid blowing memory when allocating a multiprocessing shared
    https://hg.python.org/cpython/rev/0f944e424d67

    @pitrou
    Copy link
    Member

    pitrou commented Apr 13, 2015

    Ok, I've committed the patch. If desired, the generic API for shared memory can be tackled in a separate issue. Thank you Médéric!

    @pitrou pitrou closed this as completed Apr 13, 2015
    @serhiy-storchaka
    Copy link
    Member

    Instead of the loop you can use writelines():

        f.writelines([b'\0' * bs] * (size // bs))

    It would be nice to add a comment that estimate why os.ftruncate() or seek+write can't be used here. At least a link to this issue with short estimation.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2015

    Actually, recent POSIX states unconditionally that:

    « If the file previously was smaller than this size, ftruncate() shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. »

    (from http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants