This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author giampaolo.rodola
Recipients giampaolo.rodola
Date 2018-05-30.12:22:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1527682955.69.0.682650639539.issue33695@psf.upfronthosting.co.za>
In-reply-to
Content
Patch in attachment makes shutil.copytree() use os.scandir() and (differently from #33414) DirEntry instances are passed around so that cached stat()s are used also from within copy2() and copystat() functions. The number of times the filesystem gets accessed via os.stat() is therefore reduced quite consistently. A similar improvement can be done for rmtree() (but that's for another ticket). Patch and benchmark script are in attachment.

Linux (+13.5% speedup)
======================

--- without patch:

    ./python  bench.py 
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 0.551s
    7956 files and dirs, repeat 2/3... min = 0.548s
    7956 files and dirs, repeat 3/3... min = 0.548s
    best result = 0.548s

--- with patch:

    $ ./python  bench.py 
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 0.481s
    7956 files and dirs, repeat 2/3... min = 0.479s
    7956 files and dirs, repeat 3/3... min = 0.474s
    best result = 0.474s

Windows (+17% speedup)
======================

--- without patch:

    ./python  bench.py 
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 9.015s
    7956 files and dirs, repeat 2/3... min = 8.747s
    7956 files and dirs, repeat 3/3... min = 8.614s
    best result = 8.614s

--- with patch:

    $ ./python  bench.py 
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 7.827s
    7956 files and dirs, repeat 2/3... min = 7.369s
    7956 files and dirs, repeat 3/3... min = 7.153s
    best result = 7.153s

Windows SMB share (+30%)
========================

--- without patch:

    C:\Users\user\Desktop\cpython>PCbuild\win32\python.exe bench.py
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 46.853s
    7956 files and dirs, repeat 2/3... min = 46.330s
    7956 files and dirs, repeat 3/3... min = 44.720s
    best result = 44.720s

--- with patch:

    C:\Users\user\Desktop\cpython>PCbuild\win32\python.exe bench.py
    Priming the system's cache...
    7956 files and dirs, repeat 1/3... min = 31.729s
    7956 files and dirs, repeat 2/3... min = 30.936s
    7956 files and dirs, repeat 3/3... min = 30.936s
    best result = 30.936s

Number of stat() syscalls (-38%)
================================

--- without patch:

    $ strace ./python bench.py  2>&1 | grep "stat(" | wc -l
    324808
    
--- with patch:

    $ strace ./python bench.py  2>&1 | grep "stat(" | wc -l
    198768
History
Date User Action Args
2018-05-30 12:22:35giampaolo.rodolasetrecipients: + giampaolo.rodola
2018-05-30 12:22:35giampaolo.rodolasetmessageid: <1527682955.69.0.682650639539.issue33695@psf.upfronthosting.co.za>
2018-05-30 12:22:35giampaolo.rodolalinkissue33695 messages
2018-05-30 12:22:35giampaolo.rodolacreate