Message235874
To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.
For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat.
My original all-C implementation is definitely more code to review (roughly 800 lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On my Windows 7 SSD just now, running benchmark.py:
Original scandir-2.patch version:
os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast
New scandir-6.patch version:
os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast
So the all-C implementation is literally 2.5x as fast on Windows. (After both tests, just for a sanity check, I ran the ctypes version as well, and it said about 8x as fast for both runs.)
Then on Linux, not a perfect comparison (different benchmarks) but shows the same kind of trend:
Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast
New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note that "1.3x faster" should actually read "1.3x as fast" here:
bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
So again, the all-C implementation is 2.5x as fast on Linux too.
And on Linux, the incremental improvement provided by scandir-6 over listdir is hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but not for 1.3x as fast.
Admittedly a 10x speed gain (!) on Windows is still very much worth going for, so I'm positive about scandir even with a half-Python implementation, but hopefully the above shows fairly clearly why the all-C implementation is important, especially on Linux.
Also, if the consensus is in favour of slow but less C code, I think there are further tweaks we can make to the Python part of the code to improve things a bit more. |
|
Date |
User |
Action |
Args |
2015-02-13 04:39:25 | benhoyt | set | recipients:
+ benhoyt, tebeka, pitrou, vstinner, giampaolo.rodola, tim.golden, abacabadabacaba, akira, socketpair, josh.r |
2015-02-13 04:39:25 | benhoyt | set | messageid: <1423802365.04.0.822831901186.issue22524@psf.upfronthosting.co.za> |
2015-02-13 04:39:25 | benhoyt | link | issue22524 messages |
2015-02-13 04:39:24 | benhoyt | create | |
|