Message235884
Hi Ben,
2015-02-13 4:51 GMT+01:00 Ben Hoyt <report@bugs.python.org>:
> Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.
The addition of inode() should still be discussed on python-dev. The
remaining question is if the inode number (st_ino) is useful even if
st_dev is missing. I didn't added the inode() method yet to the PEP
because of that.
> However, I'm a bit frustrated that you just re-implemented the whole thing without discussion:
(...)
> So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"
I'm really sorry, I didn't want to frustrate you :-( Sorry for not
being available, I also expected someone else to review your code, but
it didn't happen. And I *want* your opinon on design choices. Sorry if
I wasn't explicit about that in my last emails.
Sorry, I was very busy last months on other projects (like asyncio). I
want scandir() to be included in Python 3.5! The3.5 alpha 1 release
was a reminder for me.
I didn't reimplement everything. Almost all code comes from your work.
I just adapted it to Python 3.5.
> I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"
I'm not happy of the benchmark results of the C+Python implementation
(scandir-6.patch is supposed to be the fasted implementation of
C+Python).
I tried to produce as much benchmark results as possible to be able to
take a decision. I used:
* hardware: SSD, HDD and tmpfs (RAM)
* file system: ext4, tmpfs, NFS/ext4
* operating systems: Linux (Fedora 21), Windows 7
I will try to summarize benchmark results to write an email to
python-dev, to make a choice between the full C implementation vs
C+Python implementation. (ctypes is not an option, it's not reliable,
portability matters.)
> To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.
It's quite clear that the C implementation is always faster than C+Python.
My point is that we have to make a choice, C+Python is a nice
compromise because it uses less C code, and C code is more expensive
to *maintain*.
> For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat.
Good point :-)
In many setup (hardware, operating system, file system), I see a low
speedup (less than 2x faster). The whole purpose of the PEP 471
becomes unclear if the speedup is not at least 2x.
Quicky summary of benchmarks:
* using C+Python (scandir-6.patch), the benchmark ("bench") is only
faster than 2x on Windows, in the other cases it's between 1.3x and
1.4x faster
* using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x
faster, it's between 3.5x and 44.6x faster, the most interesting
speedup is on Windows (44.6x faster!) |
|
Date |
User |
Action |
Args |
2015-02-13 09:28:29 | vstinner | set | recipients:
+ vstinner, tebeka, pitrou, giampaolo.rodola, tim.golden, benhoyt, abacabadabacaba, akira, socketpair, josh.r |
2015-02-13 09:28:29 | vstinner | link | issue22524 messages |
2015-02-13 09:28:29 | vstinner | create | |
|