This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benhoyt
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, pitrou, socketpair, tim.golden, vstinner
Date 2014-10-08.12:26:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1412771204.71.0.376514042329.issue22524@psf.upfronthosting.co.za>
In-reply-to
Content
Thanks for the initial response and code review, Victor. I'll take a look and respond further in the next few days.

In the meantime, however, I'm definitely open to splitting scandir out into its own C file. This will mean a little refactoring (making some functions public/non-static).

Based on the numbers so far, I'm not so keen on implementing just the sys calls in C and the rest in Python. I already do basically this with ctypes in the scandir module, and it's slowish. I'll send proper numbers through soon, but here's what I remember from running benchmark.py on my Windows laptop with SSD drive:

ctypes version: os.walk() 9x faster with scandir
CPython 3.5 C version (debug): os.walk() 24x faster with scandir
CPython 3.5 C version (release): os.walk() 55x faster with scandir

So you do get a lot of speedup from just the ctypes version, but you get a lot more (55/9 = 6x more here) by using the all-C version. Again, these numbers are from memory -- I'll send exact ones later.

One of the problems is that creating the DirEntry objects and calling their methods is fairly expensive, and if this is all done in Python, you lose a lot. I believe scandir() would end up being slower than listdir() in many cases.
History
Date User Action Args
2014-10-08 12:26:44benhoytsetrecipients: + benhoyt, pitrou, vstinner, giampaolo.rodola, tim.golden, abacabadabacaba, akira, socketpair
2014-10-08 12:26:44benhoytsetmessageid: <1412771204.71.0.376514042329.issue22524@psf.upfronthosting.co.za>
2014-10-08 12:26:44benhoytlinkissue22524 messages
2014-10-08 12:26:43benhoytcreate