This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, pitrou, socketpair, tim.golden, vstinner
Date 2014-10-09.12:59:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1412859577.46.0.351555434823.issue22524@psf.upfronthosting.co.za>
In-reply-to
Content
> I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

I don't think that hardware matters. As I wrote, I expect the whole /usr/share tree to fit in memory. It's sounds more like optimizations in the Linux kernel. I ran benchmarks on Fedora 20 with the Linux kernel 3.14.

> Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

We didn't try yet to call readdir() multiple times in the C iterator and use a small cache (ex: between 10 and 1000 items, I don't know which size is the best yet) to also limit the number of readdir() calls. The cache would be an array of dirent on Linux.

scandir_helper() can return an array of items instead of a single item for example.

I can try to implement it if you want.
History
Date User Action Args
2014-10-09 12:59:37vstinnersetrecipients: + vstinner, pitrou, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba, akira, socketpair
2014-10-09 12:59:37vstinnersetmessageid: <1412859577.46.0.351555434823.issue22524@psf.upfronthosting.co.za>
2014-10-09 12:59:37vstinnerlinkissue22524 messages
2014-10-09 12:59:37vstinnercreate