This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients abacabadabacaba, akira, benhoyt, giampaolo.rodola, josh.r, pitrou, socketpair, tebeka, tim.golden, vstinner
Date 2015-02-13.09:28:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwa1WRKEPNwDesYJ3Y5ZN2wrUtPLLXTWu16vW6qtvShNsw@mail.gmail.com>
In-reply-to <1423802365.04.0.822831901186.issue22524@psf.upfronthosting.co.za>
Content
Hi Ben,

2015-02-13 4:51 GMT+01:00 Ben Hoyt <report@bugs.python.org>:
> Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.

The addition of inode() should still be discussed on python-dev. The
remaining question is if the inode number (st_ino) is useful even if
st_dev is missing. I didn't added the inode() method yet to the PEP
because of that.

> However, I'm a bit frustrated that you just re-implemented the whole thing without discussion:
(...)
> So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"

I'm really sorry, I didn't want to frustrate you :-( Sorry for not
being available, I also expected someone else to review your code, but
it didn't happen. And I *want* your opinon on design choices. Sorry if
I wasn't explicit about that in my last emails.

Sorry, I was very busy last months on other projects (like asyncio). I
want scandir() to be included in Python 3.5! The3.5 alpha 1 release
was a reminder for me.

I didn't reimplement everything. Almost all code comes from your work.
I just adapted it to Python 3.5.

> I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"

I'm not happy of the benchmark results of the C+Python implementation
(scandir-6.patch is supposed to be the fasted implementation of
C+Python).

I tried to produce as much benchmark results as possible to be able to
take a decision. I used:

* hardware: SSD, HDD and tmpfs (RAM)
* file system: ext4, tmpfs, NFS/ext4
* operating systems: Linux (Fedora 21), Windows 7

I will try to summarize benchmark results to write an email to
python-dev, to make a choice between the full C implementation vs
C+Python implementation. (ctypes is not an option, it's not reliable,
portability matters.)

> To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.

It's quite clear that the C implementation is always faster than C+Python.

My point is that we have to make a choice, C+Python is a nice
compromise because it uses less C code, and C code is more expensive
to *maintain*.

> For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat.

Good point :-)

In many setup (hardware, operating system, file system), I see a low
speedup (less than 2x faster). The whole purpose of the PEP 471
becomes unclear if the speedup is not at least 2x.

Quicky summary of benchmarks:

* using C+Python (scandir-6.patch), the benchmark ("bench") is only
faster than 2x on Windows, in the other cases it's between 1.3x and
1.4x faster
* using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x
faster, it's between 3.5x and 44.6x faster, the most interesting
speedup is on Windows (44.6x faster!)
History
Date User Action Args
2015-02-13 09:28:29vstinnersetrecipients: + vstinner, tebeka, pitrou, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba, akira, socketpair, josh.r
2015-02-13 09:28:29vstinnerlinkissue22524 messages
2015-02-13 09:28:29vstinnercreate