This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, josh.r, pitrou, socketpair, tebeka, tim.golden, vstinner
Date 2015-02-13.09:08:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1423818529.93.0.529574950994.issue22524@psf.upfronthosting.co.za>
In-reply-to
Content
Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is more realistic, but I'm interested by micro-benchmark results.

scandir-2.patch is faster than scandir-6.patch, much fast on Windows.

Result of bench (cached): scandir-6.patch => scandir-2.patch

* Windows 7 VM using NTFS: 14.0x faster => 44.6x faster
* laptop using NFS share: 1.3x faster => 5.2x faster   *** warning: unstable results ***
* desktop PC using /tmp: 1.3x faster => 3.8x faster
* laptop using SSD and ext4: 1.3x faster => 2.8x faster
* desktop PC using HDD and ext4: 1.4x faster => 1.4x faster


Benchmark using scandir-2.patch
-------------------------------


Benchmark results with the full C implementation, scandir-2.patch.

[ C implementation ] Results of bench_scandir2.py on my desktop PC using HDD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.5x faster than listdir (scandir: 63.6 ms, listdir: 219.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 52.8 ms, listdir: 42.4 ms)
- bench_nocache: 1.4x faster than listdir (scandir: 3745.2 ms, listdir: 5217.6 ms)
- bench_nostat_nocache: 1.4x faster than listdir (scandir: 3834.1 ms, listdir: 5380.7 ms)

[ C implementation ] Results of bench_scandir2.py on my desktop PC using /tmp (tmpfs):

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.8x faster than listdir (scandir: 46.7 ms, listdir: 176.4 ms)
- bench_nostat: 0.7x faster than listdir (scandir: 38.6 ms, listdir: 28.6v)

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 44.6x faster than listdir (scandir: 125.0 ms, listdir: 5574.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 92.4 ms, listdir: 74.7 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using SSD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.6x faster (scandir: 59.4 ms, listdir: 213.3 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 50.0 ms, listdir: 38.6)
- bench_nocache: 2.8x faster than listdir (scandir: 377.5 ms, listdir: 1073.1)
- bench_nostat_nocache: 2.8x faster than listdir (scandir: 370.9 ms, listdir: 1055.0)

[ C implementation ] Results of bench_scandir2.py on my laptop using tmpfs:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 4.0x faster than listdir (scandir: 43.7 ms, listdir: 174.1)
- bench_nostat: 0.7x faster than listdir (scandir: 35.2 ms, listdir: 24.5)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 5.2x faster than listdir (scandir: 4.2 ms, listdir: 21.7 ms)
- bench_nostat: 0.6x faster than listdir (scandir: 3.3 ms, listdir: 1.9 ms)


*** Again, results with NFS are not reliable. Sometimes listing a directory conten takes 40 seconds. It's maybe a network issue. ***

It looks like d_type can be DT_UNKNOWN on NFS.


Benchmark using scandir-6.patch
-------------------------------

I rerun benchmark with scandir-6.patch with more files to compare the two benchmarks.

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 14.0x faster than listdir (scandir: 399.0 ms, listdir: 5578.7 ms)
- bench_nostat: 0.3x faster than listdir (scandir: 279.2 ms, listdir: 76.1 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 1.5x faster than listdir (scandir: 14.8 ms, listdir: 21.4 ms)
- bench_nostat: 0.2x faster than listdir (scandir: 10.6 ms, listdir: 2.2 ms)
History
Date User Action Args
2015-02-13 09:08:49vstinnersetrecipients: + vstinner, tebeka, pitrou, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba, akira, socketpair, josh.r
2015-02-13 09:08:49vstinnersetmessageid: <1423818529.93.0.529574950994.issue22524@psf.upfronthosting.co.za>
2015-02-13 09:08:49vstinnerlinkissue22524 messages
2015-02-13 09:08:49vstinnercreate