Message322975
Yes, file copy (open() + read() + write()) is of course more expensive than just "reading" a tree (os.walk(), glob()) or deleting it (rmtree()) and the "pure file copy" time adds up to the benchmark. And indeed it's not an coincidence that #33671 (which replaced read() + write() with sendfile()) shaved off a 5% gain from the benchmark I posted initially for Linux.
Still, in a 8k small-files-tree scenario we're seeing ~9% gain on Linux, 20% on Windows and 30% on a SMB share on localhost vs. VirtualBox. I do not consider this a "hardly noticeable gain" as you imply: it is noticeable, exponential and measurable, even with cache being involved (as it is).
Note that the number of stat() syscalls per file is being reduced from 6 to 1 (or more if follow_symlinks=False), and that is the real gist here. That *does* make a difference on a regular Windows fs and makes a huge difference with network filesystems in general, as a simple stat() call implies access to the network, not the disk. |
|
Date |
User |
Action |
Args |
2018-08-02 16:10:20 | giampaolo.rodola | set | recipients:
+ giampaolo.rodola, brett.cannon, ncoghlan, vstinner, benjamin.peterson, tarek, stutzbach, benhoyt, serhiy.storchaka, yselivanov |
2018-08-02 16:10:20 | giampaolo.rodola | set | messageid: <1533226220.0.0.56676864532.issue33695@psf.upfronthosting.co.za> |
2018-08-02 16:10:19 | giampaolo.rodola | link | issue33695 messages |
2018-08-02 16:10:19 | giampaolo.rodola | create | |
|