This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, josh.r, pitrou, socketpair, tebeka, tim.golden, vstinner
Date 2015-02-13.00:38:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1423787904.82.0.874829610084.issue22524@psf.upfronthosting.co.za>
In-reply-to
Content
I enhanced bench_scandir2.py to have one command to create a directory or a different command to run the benchmark.

All commands:
- create: create the directory for tests (you don't need this command, you can also use an existing directory)
- bench: compare scandir+is_dir to listdir+stat, cached
- bench_nocache: compare scandir+is_dir to listdir+stat, flush disk caches
- bench_nostat: compare scandir to listdir, cached
- bench_nostat_nocache: compare scandir to listdir, flush disk caches

--

New patch version 6 written for performances, changes:

- On POSIX, decode the filename in C
- _scandir() iterator now yields list of items, instead of an single item

With my benchmarks, I see that yielding 10 items reduces the overhead of scandir on Linux (creating DirEntry objects). On Windows, the number of items has no effect. I prefer to also fetch entries 10 per 10 to mimic POSIX. Later, on POSIX, we may use directly getdents() and yield the full getdents() result at once. according to strace, it's currently around 800 entries per getdents() syscall.


Results of bench_scandir2.py on my laptop using SSD and ext4 filesystem:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
- bench_nostat: 0.4x faster (scandir: 104.0 ms, listdir: 38.5 ms)
- bench_nocache: 2.1x faster (scandir: 460.2 ms, listdir: 983.2 ms)
- bench_nostat_nocache: 2.2x faster (scandir: 480.4 ms, listdir: 1055.6 ms)

Results of bench_scandir2.py on my laptop using NFS share (server: ext4 filesystem) and slow wifi:

- 11,100 entries (1,0000 files, 100 symlinks, 1000 directories)
- bench: 1.3x faster (scandir: 22.5 ms, listdir: 28.9 ms)
- bench_nostat: 0.2x faster (scandir: 14.3 ms, listdir: 3.2 ms)

*** Timings with NFS are not reliable. Sometimes, a directory listing takes more than 30 seconds, but then it takes less than 100 ms. ***

Results of bench_scandir2.py on a Windows 7 VM using NTFS:

- 11,100 entries (10,000 files, 1,000 directories, 100 symlinks)
- bench: 9.9x faster (scandir: 58.3 ms, listdir: 578.5 ms)
- bench_nostat: 0.3x faster (scandir: 28.5 ms, listdir: 7.6 ms)

Results of bench_scandir2.py on my desktop PC using tmpfs (/tmp):

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 1.3x faster (scandir: 149.2 ms, listdir: 189.2 ms)
- bench_nostat: 0.3x faster (scandir: 91.9 ms, listdir: 27.1 ms)

Results of bench_scandir2.py on my desktop PC using HDD and ext4:

- 110,100 entries (100000 files, 100 symlinks, 10000 directories)
- bench: 1.4x faster (scandir: 168.5 ms, listdir: 238.9 ms)
- bench_nostat: 0.4x faster (scandir: 107.5 ms, listdir: 41.9 ms)
History
Date User Action Args
2015-02-13 00:38:26vstinnersetrecipients: + vstinner, tebeka, pitrou, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba, akira, socketpair, josh.r
2015-02-13 00:38:24vstinnersetmessageid: <1423787904.82.0.874829610084.issue22524@psf.upfronthosting.co.za>
2015-02-13 00:38:24vstinnerlinkissue22524 messages
2015-02-13 00:38:24vstinnercreate