This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author giampaolo.rodola
Recipients benhoyt, benjamin.peterson, brett.cannon, giampaolo.rodola, ncoghlan, serhiy.storchaka, stutzbach, tarek, vstinner, yselivanov
Date 2018-08-02.16:10:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1533226220.0.0.56676864532.issue33695@psf.upfronthosting.co.za>
In-reply-to
Content
Yes, file copy (open() + read() + write()) is of course more expensive than just "reading" a tree (os.walk(), glob()) or deleting it (rmtree()) and the "pure file copy" time adds up to the benchmark. And indeed it's not an coincidence that #33671 (which replaced read() + write() with sendfile()) shaved off a 5% gain from the benchmark I posted initially for Linux.

Still, in a 8k small-files-tree scenario we're seeing ~9% gain on Linux, 20% on Windows and 30% on a SMB share on localhost vs. VirtualBox. I do not consider this a "hardly noticeable gain" as you imply: it is noticeable, exponential and measurable, even with cache being involved (as it is). 

Note that the number of stat() syscalls per file is being reduced from 6 to 1 (or more if follow_symlinks=False), and that is the real gist here. That *does* make a difference on a regular Windows fs and makes a huge difference with network filesystems in general, as a simple stat() call implies access to the network, not the disk.
History
Date User Action Args
2018-08-02 16:10:20giampaolo.rodolasetrecipients: + giampaolo.rodola, brett.cannon, ncoghlan, vstinner, benjamin.peterson, tarek, stutzbach, benhoyt, serhiy.storchaka, yselivanov
2018-08-02 16:10:20giampaolo.rodolasetmessageid: <1533226220.0.0.56676864532.issue33695@psf.upfronthosting.co.za>
2018-08-02 16:10:19giampaolo.rodolalinkissue33695 messages
2018-08-02 16:10:19giampaolo.rodolacreate