This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author scott.dial
Recipients benhoyt, scott.dial, vstinner
Date 2015-03-10.02:57:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1425956244.38.0.823690217539.issue23605@psf.upfronthosting.co.za>
In-reply-to
Content
I cloned https://github.com/benhoyt/scandir @ 494f34d784 and ran benchmark.py on a couple systems that are Linux backed by a couple different NFS servers of various quality.

First, a Linux VM backed by a Mac OS X NFS server backed by a SSD:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.088s, scandir.walk took 0.084s -- 1.1x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.142s, scandir.walk took 0.145s -- 1.0x as fast

Second, a Linux VM backed by a Linux NFS server backed by a NAS with big, slow drives:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.071s, scandir.walk took 0.063s -- 1.1x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.118s, scandir.walk took 0.141s -- 0.8x as fast

Finally, a linux VM backed by a Linux NFS server backed by a NAS with small, fast SAS drives:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.159s, scandir.walk took 0.119s -- 1.3x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.229s, scandir.walk took 0.232s -- 1.0x as fast

A major factor that is not addressed above is that the performance is dramatically different if the metadata cache for the NFS mount is disabled, which is not the default. In the above data, the first system is normally configured in such a manner in order to ensure that the filesystem is coherent. The results of that test is much more dramatic:

$ python benchmark.py
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 4.835s, scandir.walk took 0.097s -- 49.9x as fast
$ python benchmark.py -s
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 9.945s, scandir.walk took 5.373s -- 1.9x as fast
History
Date User Action Args
2015-03-10 02:57:24scott.dialsetrecipients: + scott.dial, vstinner, benhoyt
2015-03-10 02:57:24scott.dialsetmessageid: <1425956244.38.0.823690217539.issue23605@psf.upfronthosting.co.za>
2015-03-10 02:57:24scott.diallinkissue23605 messages
2015-03-10 02:57:22scott.dialcreate