This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benhoyt
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, pitrou, socketpair, tim.golden, vstinner
Date 2014-10-09.12:35:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1412858141.51.0.983204385418.issue22524@psf.upfronthosting.co.za>
In-reply-to
Content
Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

The numbers are significantly better on Windows, as you can see. Even the smallest numbers I've seen with "--scandir os" are around 12x range on Windows.

In any case, Victor's last tests are "right" -- I presume we'll have *some* C, so what we want to be comparing is "benchmark.py --scandir c" versus "benchmark.py --scandir os": the some C version versus the all C version in the attached CPython 3.5 patch.

BTW, Victor, "Generic" isn't really useful. I just used it as a test case that calls listdir() and os.stat() to implement the scandir/DirEntry interface. So it's going to be strictly slower than listdir + stat due to using listdir and creating all those DirEntry objects.

Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?
History
Date User Action Args
2014-10-09 12:35:41benhoytsetrecipients: + benhoyt, pitrou, vstinner, giampaolo.rodola, tim.golden, abacabadabacaba, akira, socketpair
2014-10-09 12:35:41benhoytsetmessageid: <1412858141.51.0.983204385418.issue22524@psf.upfronthosting.co.za>
2014-10-09 12:35:41benhoytlinkissue22524 messages
2014-10-09 12:35:40benhoytcreate