Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase shutil.COPY_BUFSIZE #80284

Closed
methane opened this issue Feb 25, 2019 · 10 comments
Closed

Increase shutil.COPY_BUFSIZE #80284

methane opened this issue Feb 25, 2019 · 10 comments
Labels
3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@methane
Copy link
Member

methane commented Feb 25, 2019

BPO 36103
Nosy @giampaolo, @methane, @desbma
PRs
  • bpo-36103: change default buffer size of shutil.copyfileobj() #12115
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-02.04:32:31.529>
    created_at = <Date 2019-02-25.09:27:56.404>
    labels = ['3.8', 'library', 'performance']
    title = 'Increase shutil.COPY_BUFSIZE'
    updated_at = <Date 2019-03-02.04:32:31.529>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2019-03-02.04:32:31.529>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-02.04:32:31.529>
    closer = 'methane'
    components = ['Library (Lib)']
    creation = <Date 2019-02-25.09:27:56.404>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36103
    keywords = ['patch']
    message_count = 10.0
    messages = ['336505', '336523', '336533', '336546', '336623', '336643', '336648', '336685', '336987', '336988']
    nosy_count = 3.0
    nosy_names = ['giampaolo.rodola', 'methane', 'desbma']
    pr_nums = ['12115']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue36103'
    versions = ['Python 3.8']

    @methane
    Copy link
    Member Author

    methane commented Feb 25, 2019

    shutil.COPY_BUFSIZE is 16KiB on non-Windows platform.
    But it seems bit small for performance.

    As this article[1], 128KiB is the best performance on common system.
    [1]: https://eklitzke.org/efficient-file-copying-on-linux

    Another resource: EBS document [2] uses 128KiB I/O for throughput.
    [2]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html

    Can we increase shutil.COPY_BUFSIZE to 128KiB by default?

    Note that 128KiB is small enough when comparing with Windows (1MB by default).

    @methane methane added stdlib Python modules in the Lib dir 3.8 only security fixes labels Feb 25, 2019
    @methane methane changed the title Increase Increase shutil.COPY_BUFSIZE Feb 25, 2019
    @methane methane added the performance Performance or resource usage label Feb 25, 2019
    @desbma
    Copy link
    Mannequin

    desbma mannequin commented Feb 25, 2019

    Your first link explains why 128kB buffer size is faster in the context of cp: it's due to fadvise and kernel read ahead.

    None of the shutil functions call fadvise, so the benchmark and conclusions are irrelevant to the Python buffer size IMO.

    In general, the bigger buffer, the better, to reduce syscall frequency (also explained in the article), but going from 16kB to 128kB is clearly in the micro optimization range, unlikely to do any significant difference.

    Also with 3.8, in many typical file copy cases (but not all), sendfile will be used, which makes buffer size even less important.

    @methane
    Copy link
    Member Author

    methane commented Feb 25, 2019

    Your first link explains why 128kB buffer size is faster in the context of cp: it's due to fadvise and kernel read ahead.

    None of the shutil functions call fadvise, so the benchmark and conclusions are irrelevant to the Python buffer size IMO.

    Even without fadvice, readahead works automatically. fadvice doubles readahead size on Linux. But I don't know it really doubles readahead size when block device advertised readahead size.

    In general, the bigger buffer, the better, to reduce syscall frequency (also explained in the article), but going from 16kB to 128kB is clearly in the micro optimization range, unlikely to do any significant difference.

    Also with 3.8, in many typical file copy cases (but not all), sendfile will be used, which makes buffer size even less important.

    It is used for copyfileobj. So better default value may worth enough.

    In my Linux box, SATA SSD (Samsung SSD 500GB 860EVO) is used.
    It has unstable sequential write performance.

    Here is quick test:

    $ dd if=/dev/urandom of=f1 bs=1M count=1k
    
    $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 8*1024); f1.close(); f2.close()'
    raw times: 301 msec, 302 msec, 301 msec, 301 msec, 300 msec

    1 loop, best of 5: 300 msec per loop

    $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 16*1024); f1.close(); f2.close()'
    raw times: 194 msec, 194 msec, 193 msec, 193 msec, 193 msec

    1 loop, best of 5: 193 msec per loop

    $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 32*1024); f1.close(); f2.close()'
    raw times: 140 msec, 140 msec, 140 msec, 140 msec, 140 msec

    1 loop, best of 5: 140 msec per loop

    $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 64*1024); f1.close(); f2.close()'
    raw times: 112 msec, 112 msec, 112 msec, 112 msec, 112 msec

    1 loop, best of 5: 112 msec per loop

    $ ./python -m timeit -n1 -r5 -v -s 'import shutil;' -- 'f1=open("f1","rb"); f2=open("/dev/null", "wb"); shutil.copyfileobj(f1, f2, 128*1024); f1.close(); f2.close()'
    raw times: 101 msec, 101 msec, 101 msec, 101 msec, 101 msec

    As far as this result, I think 64KiB is the best balance.

    @desbma
    Copy link
    Mannequin

    desbma mannequin commented Feb 25, 2019

    If you do a benchmark by reading from a file, and then writing to /dev/null several times, without clearing caches, you are measuring *only* the syscall overhead:

    • input data is read from the Linux page cache, not the file on your SSD itself
    • no data is written (obviously because output is /dev/null)

    Your current command line also measures open/close timings, without that I think the speed should linearly increase when doubling buffer size, but of course this is misleading, because its a synthetic benchmark.

    Also if you clear caches in between tests, and write the output file to the SSD itself, sendfile will be used, and should be even faster.

    So again I'm not sure this means much compared to real world usage.

    @methane
    Copy link
    Member Author

    methane commented Feb 26, 2019

    desbma <dutch109@gmail.com> added the comment:

    If you do a benchmark by reading from a file, and then writing to /dev/null several times, without clearing caches, you are measuring *only* the syscall overhead:

    • input data is read from the Linux page cache, not the file on your SSD itself

    Yes. I measures syscall overhead to determine reasonable buffer size.
    shutil may be used when page cache is warm.

    • no data is written (obviously because output is /dev/null)

    As I said before, my SSD doesn't have stable write performance. (It
    is typical for consumer SSD).
    So this is intensional.
    And there are use cases copy from/to io.BytesIO or other file-like objects.

    Your current command line also measures open/close timings, without that I think the speed should linearly increase when doubling buffer size, but of course this is misleading, because its a synthetic benchmark.

    I'm not measuring speed of my cheap SSD. The goal of this benchmark is finding
    reasonable buffer size.
    There are vary real usages. So reducing syscall overhead with
    reasonable buffer size
    is worth enough.

    Also if you clear caches in between tests, and write the output file to the SSD itself, sendfile will be used, and should be even faster.

    No. sendfile is not used by shutil.copyfileobj, even if dst is real
    file on disk.

    So again I'm not sure this means much compared to real world usage.

    "Real world usage" is vary. Sometime it is not affected. Sometime it affects.

    On the other hand, what is the cons of changing 16KiB to 64KiB?
    Windows used 1MiB already. And CPython runtime uses a few MBs of memory too.

    @giampaolo
    Copy link
    Contributor

    @inada: having played with this in the past I seem to remember that on Linux the bigger bufsize doesn't make a reasonable difference (but I may be wrong), that's why I suggest to try some benchmarks. In bpo-33671 I pasted some one-liners you can use (and you should target copyfileobj() instead of copyfile() in order to skip the os.sendfile() path). Also on Linux "echo 3 | sudo tee /proc/sys/vm/drop_caches" is supposed to disable the cache.

    @methane
    Copy link
    Member Author

    methane commented Feb 26, 2019

    Also on Linux "echo 3 | sudo tee /proc/sys/vm/drop_caches" is supposed to disable the cache.

    As I said already, shutil is not used only with cold cache.

    If cache is cold, disk speed will be upper bound in most cases.
    But when cache is hot, or using very fast NVMe disk, syscall overhead
    can be non-negligible.

    @methane
    Copy link
    Member Author

    methane commented Feb 26, 2019

    Read this file too.
    http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ioblksize.h

    coreutils choose 128KiB for *minimal* buffer size to reduce syscall overhead.
    In case of shutil, we have Python interpreter overhead adding to syscall overhead.
    Who has deeper insights than coreutils author?

    I think 128KiB is the best, but I'm OK to 64KiB for conservative decision.

    @methane
    Copy link
    Member Author

    methane commented Mar 2, 2019

    New changeset 4f19030 by Inada Naoki in branch 'master':
    bpo-36103: change default buffer size of shutil.copyfileobj() (GH-12115)
    4f19030

    @methane
    Copy link
    Member Author

    methane commented Mar 2, 2019

    I chose 64 KiB because performance difference between 64 and 128 KiB
    I can see is only up to 5%.

    @methane methane closed this as completed Mar 2, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants