classification
Title: PEP 471 implementation: os.scandir() directory scanning function
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: abacabadabacaba, akira, alexei.romanov, benhoyt, giampaolo.rodola, josh.r, pitrou, python-dev, socketpair, tebeka, tim.golden, vstinner
Priority: normal Keywords: patch

Created on 2014-09-30 14:42 by benhoyt, last changed 2015-03-10 13:24 by benhoyt. This issue is now closed.

Files
File name Uploaded Description Edit
scandir-1.patch benhoyt, 2014-10-07 01:21 Patch to document, implement, and test os.scandir() review
clear_system_cache.patch vstinner, 2014-10-09 10:56
do_os_walk_getsize.patch vstinner, 2014-10-09 11:24
scandir-2.patch benhoyt, 2014-10-18 20:53 review
get_tree_size_listdir.diff akira, 2014-11-27 04:31
scandir-3.patch vstinner, 2015-02-12 11:15 review
scandir-4.patch vstinner, 2015-02-12 15:20 review
scandir-5.patch vstinner, 2015-02-12 17:25 review
bench_scandir.py vstinner, 2015-02-12 18:00
bench_scandir2.py vstinner, 2015-02-12 23:41
scandir-6.patch vstinner, 2015-02-13 00:38 review
scandir-7.patch benhoyt, 2015-02-27 11:45 Updated all-C implementation review
scandir-8.patch benhoyt, 2015-03-07 13:25 Updated all-C implementation per Victor's review review
whatsnew.patch benhoyt, 2015-03-10 11:36 review
Messages (56)
msg227933 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-09-30 14:42
Opening this to track the implementation of PEP 471: os.scandir() [1]. This supercedes Issue #11406 (and possibly others)?

The implementation is most of the way there, but not yet done as a CPythono 3.5 patch. Before I have a proper patch, it's available on GitHub [2] -- see posixmodule_scandir*.c, test/test_scandir.py, and os.rst.

[1] http://legacy.python.org/dev/peps/pep-0471/
[2] https://github.com/benhoyt/scandir
msg228751 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-07 01:21
Attaching my first patch here. It includes docs, the implementation of scandir and DirEntry in posixmodule.c, and tests in test_scandir.py. It does not yet include the changes for os.walk() to use scandir; I'm hoping to do that in a separate patch for easier code review.

I'd love some code and documentation review, as well as some thoughts on these open questions:

1) posixmodule.c is getting kinda huge and unwieldy; it'd be nice to implement this in a separate C file, but that comes with its own problems, as several functions would need to be made available externally.

2) While writing the tests, I was getting issues with Windows (apparently) not deleting the symlinks immediately in my teardown function, so for the moment I just don't call the teardown function at all. Not great; I think we'll want a better solution.

3) Partly due to #2, I haven't yet added tests for the file-not-found condition if a file is found by the directory scanning but then deleted before a DirEntry.is_dir() or DirEntry.is_file() call. I intend to add these tests, and there's a TODO comment in test_scandir.py so I remember.

4) test_scandir.py will need some further cleanup for inclusion in CPython; it's currently taken straight from my scandir module, which supports Python down to Python 2.6.
msg228775 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-07 21:33
Hi Ben,

I started to review your patch, sorry for the delay :-( It's convinient to have the patch on Rietveld to comment it inline.

I didn't expect so much C code. The posixmodule.c is really huge. Well, it's just the longest C file of Python 3.5. Your patch adds 781 more lines to it. It's maybe time to split the file into smaller files.

At the same time, reading C code is boring and it's too easy to "implement bugs". I need to be convinced that the speedup announced in the PEP requires so much C code. In the PEP and https://github.com/benhoyt/scandir#benchmarks it's not clear if the speedup comes from the C code (instead of implementing scandir in pure Python, for example using ctypes) or the lower number of syscalls.

To have a fair benchmark, the best would be to have a C part only implementing opendir/readir and FirstFindFile/FindFileXXX (I'm not sure that ctypes is as fast as C code written with the Python C API), and a Python part which implements the nice Python API.

If the speedup is 10% or lower, I would prefer to not add so much C code and take the compromise of a thin C wrapper and implement scandir in Python.

If we implement os.scandir() in Python, should we still try to share code with os.listdir()? I see that importlib uses os.listdir(), so we need a listdir() function implemented in C at least for importlib.
msg228780 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-08 09:14
I agree that having a high-level wrapper around a low-level C primitive would be cool, but someone has to experiment on that to find out how much performance it would cost.

You may want to have the C primitive return results in batches (of e.g. 10 or 100 entries) to limit the overhead, btw.
msg228784 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-08 10:24
> You may want to have the C primitive return results in batches (of e.g. 10 or 100 entries) to limit the overhead, btw.

Ah yes, good idea. I read that internally readdir() also fetchs many
entries in a single syscall (prefetch" / readahead, I don't know how
to call that).
msg228786 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-08 12:26
Thanks for the initial response and code review, Victor. I'll take a look and respond further in the next few days.

In the meantime, however, I'm definitely open to splitting scandir out into its own C file. This will mean a little refactoring (making some functions public/non-static).

Based on the numbers so far, I'm not so keen on implementing just the sys calls in C and the rest in Python. I already do basically this with ctypes in the scandir module, and it's slowish. I'll send proper numbers through soon, but here's what I remember from running benchmark.py on my Windows laptop with SSD drive:

ctypes version: os.walk() 9x faster with scandir
CPython 3.5 C version (debug): os.walk() 24x faster with scandir
CPython 3.5 C version (release): os.walk() 55x faster with scandir

So you do get a lot of speedup from just the ctypes version, but you get a lot more (55/9 = 6x more here) by using the all-C version. Again, these numbers are from memory -- I'll send exact ones later.

One of the problems is that creating the DirEntry objects and calling their methods is fairly expensive, and if this is all done in Python, you lose a lot. I believe scandir() would end up being slower than listdir() in many cases.
msg228818 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-08 21:55
Here are the actual numbers (instead of just from memory) running on my Windows laptop, which happens to have an SSD drive, so os.walk() with scandir is particularly good here.

python 3.4: benchmark.py -c python (all implemented in Python using ctypes):

os.walk took 1.011s, scandir.walk took 0.057s -- 17.8x as fast

python 3.4: benchmark.py -c c (using _scandir.c, so the iteration implemented in C, the DirEntry object in Python):

os.walk took 1.014s, scandir.walk took 0.039s -- 25.8x as fast

python 3.5: benchmark.py -c os (using the new all-C version in posixmodule.c):

os.walk took 0.983s, scandir.walk took 0.019s -- 52.3x as fast

So as you can see, there's still another 2x speedup to be gained from having everything in C. I know it's a bit more to review, but my thinking is if we're going to speed this baby up, let's speed it right up!

I haven't done these tests on Linux yet.
msg228819 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-08 22:16
Ben, how can I run the benchmark on my own machine?

Thanks!
msg228821 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-08 23:03
Ok, I've found it. The numbers don't look that good here (Linux, SSD):

$ ~/cpython/opt/python benchmark.py -c python /usr/share/ 
[...]
os.walk took 1.266s, scandir.walk took 1.852s -- 0.7x as fast

$ ~/cpython/opt/python benchmark.py -c c /usr/share/ 
[...]
os.walk took 1.276s, scandir.walk took 1.266s -- 1.0x as fast

$ ~/cpython/opt/python benchmark.py -c os /usr/share/ 
[...]
os.walk took 1.278s, scandir.walk took 0.585s -- 2.2x as fast
msg228854 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-09 10:56
I cloned https://github.com/benhoyt/scandir. I understand that the --scandir command line option of benchmark.py are these choices:

- generic = call listdir() and then use "yield GenericDirEntry" which caches os.stat() and os.lstat() results
- python = ctypes implemented calling opendir/readdir and yields PosixDirEntry objects which uses d_type field from readdir() in is_dir(), is_file() and is_symlink(). Cache the result of os.stat() and os.lstat()
- c = "scandir_helper" (iterator) implemented in C (Python C API) yielding PosixDirEntry objects (same class than the "python" benchmark)


I checked with an assertion: d_type of readdir() is never DT_UNKNOWN on my Linux Fedora 20. Statistics of PosixDirEntry on my /usr/share tree:

- 155544 PosixDirEntry instances
- fast-path (use d_type) taken 466632 times in is_dir/is_symlink
- slow-path (need to call os.stat or os.lstat) taken 7828 times in is_dir/is_symlink
- os.stat called 7832 times
- os.stat called 0 times

7832 is the number of symbolic links in my /usr/share tree. 95% of entries don't need stat() in scandir.walk() when using readdir().

So is_dir() and is_symlink() are approximatively called 3 times per entry: scandir.walk() calls is_dir() and is_symlink() on each entry, but is_dir() also calls is_symlink() by default (because the default value of the follow_symlinks parameter is True).


I ran benchmark.py on my Linux Fedora 20 (Linux kernel 3.14). I have two HDD configured as RAID0. I don't think that my disk config is revelant: I also have 12 GB of memory, I hope that /usr/share tree is fully cached. For example, "free -m" tells me that 8.8 GB are cached.

The generic implementation looks inefficient: it is 2 times slower. Is there a bug? GenericDirEntry caches os.stat() and os.lstat() result, it should be as fast or faster than os.walk(), no? Or is it the cost of a generator?

The "c" implementation is 35% faster than the "python" implementation (python=1.170 sec, c=0.762 sec).


Result of benchmark:

haypo@smithers$ python3 setup.py build && for scandir in generic python c; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.3/ python3 benchmark.py /usr/share -c $scandir || break; done
running build
running build_py
running build_ext

=== generic ===
Using very slow generic version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 1.340s, scandir.walk took 2.471s -- 0.5x as fast

=== python ===
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 1.318s, scandir.walk took 1.170s -- 1.1x as fast

=== c ===
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 1.317s, scandir.walk took 0.762s -- 1.7x as fast
msg228855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-09 10:56
I also ran the benchmark without cache on disk. It looks like my hard drive is too slow to see a real speedup of scandir(). The maximum speedup is 2.7 seconds lesser when testing the "c" scandir (24.089 => 21.374 seconds): "1.1x as fast".

I modified the benchmark to flush all caches, I added the following line into do_scandir_walk():

   os.system("sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'")

I also commented the first call to do_scandir_walk() ("Priming the system's cache..."), just to make the benchmark faster.

(See attached clear_system_cache.patch for all changes.)

Result of the modified benchmark without cache:

haypo@smithers$ python3 setup.py build && for scandir in generic python c; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.3/ python3 benchmark.py /usr/share -c $scandir || break; done
running build
running build_py
running build_ext

=== generic ===
Using very slow generic version of scandir
Comparing against builtin version of os.walk()
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 24.324s, scandir.walk took 24.215s -- 1.0x as fast

=== python ===
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 24.089s, scandir.walk took 21.374s -- 1.1x as fast

=== c ===
Using fast C version of scandir
Comparing against builtin version of os.walk()
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 24.225s, scandir.walk took 21.390s -- 1.1x as fast
msg228857 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-09 11:16
My previous benchmark runs were with the system Python 3.3.

New benchmark with a Python 3.5 patched with scandir-1.patch (compiled in release mode: ./configure && make).

"os" (os.scandir) is 2 times faster than "c" (_scandir.scandir_helper): c=0.533 sec, os=0.268 sec.

On all implementations, scandir.walk() is only much faster than os.walk() when "os" (os.scandir) is used: "3.2x as fast" (860 ms => 268 ms).

I guess that on Linux the speedup highly depends on the number of symbolic links.

It would help if benchmark.py created the tree to have more reliable numbers, and being able to compare trees without symlinks, with a few symlinks (ex: 10%), and with a lot of symlinks (ex: 99%).

Benchmark results:

haypo@smithers$ ~/prog/python/default/python setup.py build && for scandir in generic python c os; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.5/ ~/prog/python/default/python benchmark.py /usr/share -c $scandir || break; done
running build
running build_py
copying scandir.py -> build/lib.linux-x86_64-3.5
running build_ext

=== generic ===
Using very slow generic version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 0.857s, scandir.walk took 1.627s -- 0.5x as fast

=== python ===
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 0.856s, scandir.walk took 0.915s -- 0.9x as fast

=== c ===
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 0.857s, scandir.walk took 0.533s -- 1.6x as fast

=== os ===
Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast
msg228858 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-09 11:24
On Windows, I guess that "benchmark.py --size" is faster with scandir() than with os.walk(), because os.stat() is never called.

benchmark.py has a bug in do_os_walk() when the --size option is used: attached do_os_walk_getsize.patch is needed.

Sizes returned by os.walk() and scandir.walk() are different. I guess that the behaviour of symbolic links to directory is different. Because of that, I'm not sure that benchmark timings are reliable, but well, it should give us an idea of performances.

To compute the size of a tree, scandir() is twice faster (2.1x as fast) than os.walk(): os.walk=1.435 sec, scandir.walk=0.675 sec.

"os" is 41% faster than "c": c=1150 ms, os=675 ms.


Results of "benchmark.py --size" on my Linux Fedora 20:

haypo@smithers$ ~/prog/python/default/python setup.py build && for scandir in generic python c os; do echo; echo "=== $scandir ==="; PYTHONPATH=build/lib.linux-x86_64-3.5/ ~/prog/python/default/python benchmark.py -s /usr/share -c $scandir || break; done
running build
running build_py
running build_ext

=== generic ===
Using very slow generic version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.425s, scandir.walk took 1.147s -- 1.2x as fast

=== python ===
Using slower ctypes version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.421s, scandir.walk took 1.651s -- 0.9x as fast

=== c ===
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.426s, scandir.walk took 1.150s -- 1.2x as fast

=== os ===
Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /usr/share, repeat 1/3...
Benchmarking walks on /usr/share, repeat 2/3...
Benchmarking walks on /usr/share, repeat 3/3...
os.walk size 3064748475, scandir.walk size 2924332540 -- NOT EQUAL!
os.walk took 1.435s, scandir.walk took 0.675s -- 2.1x as fast
msg228866 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-09 12:35
Thanks, Victor and Antone. I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

The numbers are significantly better on Windows, as you can see. Even the smallest numbers I've seen with "--scandir os" are around 12x range on Windows.

In any case, Victor's last tests are "right" -- I presume we'll have *some* C, so what we want to be comparing is "benchmark.py --scandir c" versus "benchmark.py --scandir os": the some C version versus the all C version in the attached CPython 3.5 patch.

BTW, Victor, "Generic" isn't really useful. I just used it as a test case that calls listdir() and os.stat() to implement the scandir/DirEntry interface. So it's going to be strictly slower than listdir + stat due to using listdir and creating all those DirEntry objects.

Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?
msg228867 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-09 12:38
Le 09/10/2014 14:35, Ben Hoyt a écrit :
> 
> Anyway, where to from here? Are we agreed given the numbers that --
especially on Linux -- it makes good performance sense to use an all-C
approach?

I think so, yes.
msg228872 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-10-09 12:59
> I'm somewhat surprised at the 2-3x numbers you're seeing, as I was consistently getting 4-5x in the Linux tests I did. But it does depend quite a bit on what file system you're running, what hardware, whether you're running in a VM, etc. Still, 2-3x faster is a good speedup!

I don't think that hardware matters. As I wrote, I expect the whole /usr/share tree to fit in memory. It's sounds more like optimizations in the Linux kernel. I ran benchmarks on Fedora 20 with the Linux kernel 3.14.

> Anyway, where to from here? Are we agreed given the numbers that -- especially on Linux -- it makes good performance sense to use an all-C approach?

We didn't try yet to call readdir() multiple times in the C iterator and use a small cache (ex: between 10 and 1000 items, I don't know which size is the best yet) to also limit the number of readdir() calls. The cache would be an array of dirent on Linux.

scandir_helper() can return an array of items instead of a single item for example.

I can try to implement it if you want.
msg228873 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-09 13:02
I dunno, I'd be happy if you implement this, but it does mean *more* C code, not less. :-) I feel this would be a nice-to-have, but we should get the thing working first, and then do the multiple-OS-calls-in-one optimization.

In any case, if you implement this, I think you should do so against the CPython 3.5 patch, not the _scandir.c/scandir_helper code.
msg229659 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-10-18 20:53
Attaching updated patch (scandir-2.patch) per Victor's code review here: http://bugs.python.org/review/22524/diff/13005/Modules/posixmodule.c

Note that I haven't included test_scandir.py this time, as I haven't made any changes there, and it's not really ready for Python 3.5 yet.
msg231673 - (view) Author: Miki Tebeka (tebeka) * Date: 2014-11-25 17:18
Can we also update iglob [1] as well?

[1] https://docs.python.org/2/library/glob.html#glob.iglob
msg231677 - (view) Author: Akira Li (akira) * Date: 2014-11-25 18:31
scandir is slower on my machine:

  $ git clone https://github.com/benhoyt/scandir 
  $ cd scandir/
  $ ../cpython/python benchmark.py /usr/
  Using slower ctypes version of scandir
  Comparing against builtin version of os.walk()
  Priming the system's cache...
  Benchmarking walks on /usr/, repeat 1/3...
  Benchmarking walks on /usr/, repeat 2/3...
  Benchmarking walks on /usr/, repeat 3/3...
  os.walk took 7.761s, scandir.walk took 10.420s -- 0.7x as fast

What commands should I run to benchmark the attached patch (scandir-2.patch)?
msg231682 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-25 20:51
> scandir is slower on my machine:

Please share more information about your config: OS, disk type (hard
drive, SSD, something else), filesystem, etc.
msg231684 - (view) Author: Ben Hoyt (benhoyt) * Date: 2014-11-25 20:54
Akira, note the "Using slower ctypes version of scandir" -- this is the older, ctypes implementation of scandir. On Linux, depending on file system etc, that can often be slower. You need to at least be using the "fast C version" in _scandir.c, which is then half C, half Python. But ideally you'd use the all-C version that I've provided as a CPython 3.5 patch.
msg231703 - (view) Author: Akira Li (akira) * Date: 2014-11-26 10:59
> STINNER Victor added the comment:
>
>> scandir is slower on my machine:
>
> Please share more information about your config: OS, disk type (hard
> drive, SSD, something else), filesystem, etc.
>
Ubuntu 14.04, SSD, ext4 filesystem. Results for different python
versions are the same (scandir on pypy is even slower due to ctype
usage).

What other information could be useful?

> Ben Hoyt added the comment:
>
> Akira, note the "Using slower ctypes version of scandir" -- this is
> the older, ctypes implementation of scandir. On Linux, depending on
> file system etc, that can often be slower. You need to at least be
> using the "fast C version" in _scandir.c, which is then half C, half
> Python. But ideally you'd use the all-C version that I've provided as
> a CPython 3.5 patch.

Yes. I've noticed it, that is why I asked :) *"What commands should I
run to benchmark the attached patch (scandir-2.patch)?"*

I've read benchmark.py's source; It happens that it already supports
benchmarking of os.scandir. In my python checkout:

  cpython$ hg import --no-commit https://bugs.python.org/file36963/scandir-2.patch
  cpython$ make && cd ../scandir/
  scandir$ ../cpython/python benchmark.py -s /usr/
  Using Python 3.5's builtin os.scandir()
  Comparing against builtin version of os.walk()
  Priming the system's cache...
  Benchmarking walks on /usr/, repeat 1/3...
  Benchmarking walks on /usr/, repeat 2/3...
  Benchmarking walks on /usr/, repeat 3/3...
  os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL!
  os.walk took 13.552s, scandir.walk took 6.032s -- 2.2x as fast

It seems os.walk and scandir.walk do a different work here.

I've written get_tree_size_listdir_fd() that mimics get_tree_size() 
(see get_tree_size_listdir.diff patch) to get the same size:

  scandir$ ../cpython/python benchmark.py -s /usr
  Using Python 3.5's builtin os.scandir()
  Comparing against builtin version of os.walk()
  Comparing against get_tree_size_listdir_fd
  Priming the system's cache...
  Benchmarking walks on /usr, repeat 1/3...
  Benchmarking walks on /usr, repeat 2/3...
  Benchmarking walks on /usr, repeat 3/3...
  os.walk size 5534939617, scandir.walk size 5534939617 -- equal
  os.walk took 5.697s, scandir.walk took 5.621s -- 1.0x as fast

scandir is not noticeably faster but scandir-based code is much nicer here.
msg231745 - (view) Author: Akira Li (akira) * Date: 2014-11-27 04:31
To see what happens at syscall level, I've run various implementations
of get_tree_size() functions (see get_tree_size_listdir.diff) with
strace:

get_tree_size_listdir_fd   -- os.listdir(fd) + os.lstat
get_tree_size              -- os.scandir(path) + entry.lstat
get_tree_size_listdir_stat -- os.listdir(path) + os.lstat
get_tree_size_listdir      -- os.listdir(path) + os.path.isdir + os.lstat

Summary: 

- os.listdir() and os.scandir()-based variants use the same number of getdents() syscalls
- and the number of openat, lstat (newfstatat) syscalls is also comparable
  (except for the variant that uses os.path.isdir)

Log:

scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir_fd as f; import os; print(f(os.open("/usr/", os.O_RDONLY)))' && head -7 py.strace
5535240217
11.29user 8.14system 0:17.78elapsed 109%CPU (0avgtext+0avgdata 13460maxresident)k
0inputs+8outputs (0major+6781minor)pagefaults 0swaps
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 46.51    0.075252           0    264839           newfstatat
 16.88    0.027315           1     50460           getdents
 11.53    0.018660           0     75621           fcntl
  9.74    0.015758           0     50531           close
  6.87    0.011116           0     25214           openat
scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size as f; print(f("/usr/"))' && head -7 py.strace
5535240217
22.56user 8.47system 0:29.77elapsed 104%CPU (0avgtext+0avgdata 13280maxresident)k
0inputs+8outputs (0major+6306minor)pagefaults 0swaps
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 62.00    0.032822           0    239644           lstat
 19.97    0.010570           0     50460           getdents
  8.52    0.004510           0     25215           openat
  6.09    0.003224           0     25326           close
  0.55    0.000292           3        95           mmap
scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir_stat as f; print(f("/usr/"))' && head -7 py.strace
5535240217
13.70user 6.30system 0:18.84elapsed 106%CPU (0avgtext+0avgdata 13456maxresident)k
0inputs+8outputs (0major+6769minor)pagefaults 0swaps
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.79    0.050217           0    264849           lstat
 19.37    0.015011           0     50460           getdents
  8.22    0.006367           0     25215           openat
  5.76    0.004465           0     25326           close
  0.32    0.000247           2       114           mmap
scandir$ /usr/bin/time strace -fco py.strace ../cpython/python -c 'from benchmark import get_tree_size_listdir as f; print(f("/usr/"))' && head -7 py.strace
5535240217
19.53user 10.61system 0:28.16elapsed 107%CPU (0avgtext+0avgdata 13452maxresident)k
0inputs+8outputs (0major+6733minor)pagefaults 0swaps
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.04    0.063050           0    265195     37259 stat
 37.40    0.056086           0    265381           lstat
 11.46    0.017187           0     50460           getdents
  4.82    0.007232           0     25215           openat
  3.43    0.005139           0     25326           close
msg235458 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-06 01:37
Victor, I'd love to push forward with this (I doubt it's going to make alpha 1 on February 8, but hopefully alpha 2 on March 8).

It seems pretty settled that we need to use the all-C version I've created, especially for Linux.

Can you please review scandir-2.patch? It's on Rietveld here: http://bugs.python.org/review/22524/#ps13104

You can see what changed between scandir-1.patch and scandir-2.patch more easily here: https://github.com/benhoyt/scandir/commit/d4e2d7083319ed8988aef6ed02d670fb36a00a33

Also, here is just the scandir C code in its own file: https://github.com/benhoyt/scandir/blob/d4e2d7083319ed8988aef6ed02d670fb36a00a33/posixmodule_scandir_main.c

One other open question is: should it go into CPython's posixmodule.c as I have it in the patch, or should I try to separate it out?
msg235806 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 11:15
scandir-3.patch: New implementation based on scandir-2.patch on Ben's github repository.

Main changes with scandir-2.patch:

* new DirEntry.inode() method

* os.scandir() doesn't support bytes on Windows anymore: it's deprecated since python 3.3 and using bytes gives unexpected results on Windows


As discussed with Ben Hoyt, I added a inode() method to solve the use case described here:
https://www.reddit.com/r/Python/comments/2synry/so_8_peps_are_currently_being_proposed_for_python/cnvnz1w

I will update the PEP 471 to add the inode() method.


Notes:

* os.scandir() doesn't accept a file descriptor, as decided in the PEP 471.

* It may be nice to modify Scandir.close() to close the handle/directory; Scandir is the iterator returned by os._scandir()

* My patch doesn't modify os.walk(): it prefer to do that in a new issue

* DirEntry methods have no docstring


Changes with scandir-2.patch:

* C code is added to posixmodule.c, not into a new _scandir.c file, to avoid code duplication (all Windows code to handle attributes)
* C code is restricted to the minimum: it's now only a wrapper to opendir+readdir and FindFirstFileW/FindNextFileW
* os._scandir() directly calls opendir(), it's no more delayed to the first call to next(), to report errors earlier. In practice, I don't think that anymore will notify :-p
* don't allocate a buffer to store a HANDLE: use directly a HANDLE
* C code: use #ifdef inside functions, not outside
* On Windows, os._scandir() appends "*.*" instead of "*" to the path, to mimic os.listdir()
* put information about cache and syscall directly in the doc of DirEntry methods
* remove promise of performances from scandir doc: be more factual, explain when syscalls are required or not
* expose DT_UNKOWN, DT_DIR, DT_REG, DT_LNK constants in the posix module; but I prefer to not document them: use directly scandir!
* rewrite completly unit test:

  - reuse test.support
  - compare DirEntry attributes with the result of functions (ex: os.stat() or os.path.islink())

* add tests on removed directory, removed file and broken symbolic link
* remove ":" from repr(DirEntry) result, it's now "<DirEntry 'xxx'>"; drop __str__ method (by default, __str__ calls __repr__)
* use new OSError subclasses (FileNotFoundError)
* DirEntry methods: use stat.S_ISxxx() methods instead of "st.st_mode & 0o170000 == S_IFxxx"
msg235833 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 14:57
Updated patch. Thanks for the review Serhiy.
msg235835 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 15:20
(I regenerated scandir-4.patch, I had a local private changeset. I removed it.)
msg235848 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 17:25
Patch version 5:

- Use None value for the d_type instead of DT_UNKNOW to *prepare* support for platforms where the dirent structure has no d_type field. Serhiy guess that such platform have no DT_xxx constant neither. DirEntry doesn't DT_xxx constants if d_type is None

- Fix test_removed_file() and test_removed_dir() tests if d_type is unknown (I tested manually by forcing d_type to None)

- Use os.lstat() instead of os.stat(follow_symlinks=False) in DirEntry.inode(), it's a little bit faster even if it's the same syscall. The cost probably comes from the code parsing Python parameters, especially the keyword
msg235849 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 18:00
bench_scandir.py: dummy benchmark to compare listdir+stat vs scandir+is_dir.

os.scandir() is always slower than os.listdir() on tmpfs and ext4 partitions of a local hard driver.

I will try with NFS.


Results with scandir-5.patch on Fedora 21 (Linux).

--- Using /home/haypo (ext4, hard drive) ---

Test listdir+stat vs scandir+is_dir
Temporary directory: /home/haypo/tmpji8uviyl
Create 100000 files+symlinks...
Create 10000 directories...
# entries: 210000
Benchmark...
listdir: 2187.3 ms
scandir: 1047.2 ms
listdir: 494.4 ms
scandir: 1048.1 ms
listdir: 493.0 ms
scandir: 1042.6 ms

Result:
listdir: min=493.0 ms (2.3 us per file), max=2187.3 ms (10.4 us per file)
scandir: min=1042.6 ms (5.0 us per file), max=1048.1 ms (5.0 us per file)
scandir is between 0.5x and 2.1x faster


--- Using /tmp (tmpfs, full in memory) ---

Test listdir+stat vs scandir+is_dir
Temporary directory: /tmp/tmp6_zk3mqo
Create 100000 files+symlinks...
Create 10000 directories...
# entries: 210000
Benchmark...
listdir: 405.4 ms
scandir: 1001.3 ms
listdir: 403.3 ms
scandir: 1024.2 ms
listdir: 408.1 ms
scandir: 1013.5 ms

Remove the temporary directory...

Result:
listdir: min=403.3 ms (1.9 us per file), max=408.1 ms (1.9 us per file)
scandir: min=1001.3 ms (4.8 us per file), max=1024.2 ms (4.9 us per file)
scandir is between 0.4x and 0.4x faster
msg235859 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 22:09
Similar benchmark result on my laptop which has a SSD (ext4 filesystem tool, but I guess that the directory is small and fits into the memory).

Note: I'm not sure that the "between ...x and ...x faster" are revelant, I'm not sure that my computation is correct.

Test listdir+stat vs scandir+is_dir
Temporary directory: /home/haypo/prog/python/default/tmpbrn4r2tv
Create 100000 files+symlinks...
Create 10000 directories...
# entries: 210000
Benchmark...
listdir: 1730.7 ms
scandir: 1029.4 ms
listdir: 476.8 ms
scandir: 1058.3 ms
listdir: 485.4 ms
scandir: 1041.1 ms

Remove the temporary directory...

Result:
listdir: min=476.8 ms (2.3 us per file), max=1730.7 ms (8.2 us per file)
scandir: min=1029.4 ms (4.9 us per file), max=1058.3 ms (5.0 us per file)
scandir is between 0.5x and 1.7x faster
msg235860 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-12 22:19
Benchmark on NFS. Client: my laptop, connected to the LAN by wifi. Server: desktop, connected to the LAN by PLC. For an unknown reason, the creation of files, symlinks and directories is very slow (more than 30 seconds while I reduced the number of files & directories).

Test listdir+stat vs scandir+is_dir
Temporary directory: /home/haypo/mnt/tmp5aee0eic
Create 1000 files+symlinks...
Create 1000 directories...
# entries: 3000
Benchmark...
listdir: 14478.0 ms
scandir: 732.1 ms
listdir: 9.9 ms
scandir: 14.9 ms
listdir: 7.5 ms
scandir: 12.9 ms

Remove the temporary directory...

Result:
listdir: min=7.5 ms (2.5 us per file), max=14478.0 ms (4826.0 us per file)
scandir: min=12.9 ms (4.3 us per file), max=732.1 ms (244.0 us per file)
scandir is between 0.0x and 1119.6x faster
msg235865 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-13 00:38
I enhanced bench_scandir2.py to have one command to create a directory or a different command to run the benchmark.

All commands:
- create: create the directory for tests (you don't need this command, you can also use an existing directory)
- bench: compare scandir+is_dir to listdir+stat, cached
- bench_nocache: compare scandir+is_dir to listdir+stat, flush disk caches
- bench_nostat: compare scandir to listdir, cached
- bench_nostat_nocache: compare scandir to listdir, flush disk caches

--

New patch version 6 written for performances, changes:

- On POSIX, decode the filename in C
- _scandir() iterator now yields list of items, instead of an single item

With my benchmarks, I see that yielding 10 items reduces the overhead of scandir on Linux (creating DirEntry objects). On Windows, the number of items has no effect. I prefer to also fetch entries 10 per 10 to mimic POSIX. Later, on POSIX, we may use directly getdents() and yield the full getdents() result at once. according to strace, it's currently around 800 entries per getdents() syscall.


Results of bench_scandir2.py on my laptop using SSD and ext4 filesystem:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)
- bench_nostat: 0.4x faster (scandir: 104.0 ms, listdir: 38.5 ms)
- bench_nocache: 2.1x faster (scandir: 460.2 ms, listdir: 983.2 ms)
- bench_nostat_nocache: 2.2x faster (scandir: 480.4 ms, listdir: 1055.6 ms)

Results of bench_scandir2.py on my laptop using NFS share (server: ext4 filesystem) and slow wifi:

- 11,100 entries (1,0000 files, 100 symlinks, 1000 directories)
- bench: 1.3x faster (scandir: 22.5 ms, listdir: 28.9 ms)
- bench_nostat: 0.2x faster (scandir: 14.3 ms, listdir: 3.2 ms)

*** Timings with NFS are not reliable. Sometimes, a directory listing takes more than 30 seconds, but then it takes less than 100 ms. ***

Results of bench_scandir2.py on a Windows 7 VM using NTFS:

- 11,100 entries (10,000 files, 1,000 directories, 100 symlinks)
- bench: 9.9x faster (scandir: 58.3 ms, listdir: 578.5 ms)
- bench_nostat: 0.3x faster (scandir: 28.5 ms, listdir: 7.6 ms)

Results of bench_scandir2.py on my desktop PC using tmpfs (/tmp):

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 1.3x faster (scandir: 149.2 ms, listdir: 189.2 ms)
- bench_nostat: 0.3x faster (scandir: 91.9 ms, listdir: 27.1 ms)

Results of bench_scandir2.py on my desktop PC using HDD and ext4:

- 110,100 entries (100000 files, 100 symlinks, 10000 directories)
- bench: 1.4x faster (scandir: 168.5 ms, listdir: 238.9 ms)
- bench_nostat: 0.4x faster (scandir: 107.5 ms, listdir: 41.9 ms)
msg235873 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-13 03:51
Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.

However, I'm a bit frustrated that you just re-implemented the whole thing without discussion: I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"

You did express concern off list about an all-C vs part-Python implementation, but you said "it's still unclear to me which implementation should be added to Python" and "I don't feel able today to take the right decision. We may use python-dev to discuss that". But I didn't see that discussion at all, and I had expressed fairly clearly (both here and off list), with benchmarks, why I thought it should be the all-C version.

So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"

TLDR: I'd like to see os.scandir() in Python even if it means the slower implementation, but some discussion would have been nice. :-)

So if it's not too late, I'll continue that discussion in my next message.
msg235874 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-13 04:39
To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.

For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat.

My original all-C implementation is definitely more code to review (roughly 800 lines of C vs scandir-6.patch's 400), but it's also more than twice as fast. On my Windows 7 SSD just now, running benchmark.py:

    Original scandir-2.patch version:
    os.walk took 0.509s, scandir.walk took 0.020s -- 25.4x as fast

    New scandir-6.patch version:
    os.walk took 0.455s, scandir.walk took 0.046s -- 10.0x as fast

So the all-C implementation is literally 2.5x as fast on Windows. (After both tests, just for a sanity check, I ran the ctypes version as well, and it said about 8x as fast for both runs.)

Then on Linux, not a perfect comparison (different benchmarks) but shows the same kind of trend:

    Original scandir-2.patch benchmark (http://bugs.python.org/msg228857):
    os.walk took 0.860s, scandir.walk took 0.268s -- 3.2x as fast

    New scandir-6.patch benchmark (http://bugs.python.org/msg235865) -- note that "1.3x faster" should actually read "1.3x as fast" here:
    bench: 1.3x faster (scandir: 164.9 ms, listdir: 216.3 ms)

So again, the all-C implementation is 2.5x as fast on Linux too.

And on Linux, the incremental improvement provided by scandir-6 over listdir is hardly worth it -- I'd use a new directory listing API for 3.2x as fast, but not for 1.3x as fast.

Admittedly a 10x speed gain (!) on Windows is still very much worth going for, so I'm positive about scandir even with a half-Python implementation, but hopefully the above shows fairly clearly why the all-C implementation is important, especially on Linux.

Also, if the consensus is in favour of slow but less C code, I think there are further tweaks we can make to the Python part of the code to improve things a bit more.
msg235883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-13 09:08
Note: bench_scandir2.py is a micro-benchmark. Ben's benchmark using walk() is more realistic, but I'm interested by micro-benchmark results.

scandir-2.patch is faster than scandir-6.patch, much fast on Windows.

Result of bench (cached): scandir-6.patch => scandir-2.patch

* Windows 7 VM using NTFS: 14.0x faster => 44.6x faster
* laptop using NFS share: 1.3x faster => 5.2x faster   *** warning: unstable results ***
* desktop PC using /tmp: 1.3x faster => 3.8x faster
* laptop using SSD and ext4: 1.3x faster => 2.8x faster
* desktop PC using HDD and ext4: 1.4x faster => 1.4x faster


Benchmark using scandir-2.patch
-------------------------------


Benchmark results with the full C implementation, scandir-2.patch.

[ C implementation ] Results of bench_scandir2.py on my desktop PC using HDD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.5x faster than listdir (scandir: 63.6 ms, listdir: 219.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 52.8 ms, listdir: 42.4 ms)
- bench_nocache: 1.4x faster than listdir (scandir: 3745.2 ms, listdir: 5217.6 ms)
- bench_nostat_nocache: 1.4x faster than listdir (scandir: 3834.1 ms, listdir: 5380.7 ms)

[ C implementation ] Results of bench_scandir2.py on my desktop PC using /tmp (tmpfs):

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.8x faster than listdir (scandir: 46.7 ms, listdir: 176.4 ms)
- bench_nostat: 0.7x faster than listdir (scandir: 38.6 ms, listdir: 28.6v)

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 44.6x faster than listdir (scandir: 125.0 ms, listdir: 5574.9 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 92.4 ms, listdir: 74.7 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using SSD and ext4:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 3.6x faster (scandir: 59.4 ms, listdir: 213.3 ms)
- bench_nostat: 0.8x faster than listdir (scandir: 50.0 ms, listdir: 38.6)
- bench_nocache: 2.8x faster than listdir (scandir: 377.5 ms, listdir: 1073.1)
- bench_nostat_nocache: 2.8x faster than listdir (scandir: 370.9 ms, listdir: 1055.0)

[ C implementation ] Results of bench_scandir2.py on my laptop using tmpfs:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 4.0x faster than listdir (scandir: 43.7 ms, listdir: 174.1)
- bench_nostat: 0.7x faster than listdir (scandir: 35.2 ms, listdir: 24.5)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 5.2x faster than listdir (scandir: 4.2 ms, listdir: 21.7 ms)
- bench_nostat: 0.6x faster than listdir (scandir: 3.3 ms, listdir: 1.9 ms)


*** Again, results with NFS are not reliable. Sometimes listing a directory conten takes 40 seconds. It's maybe a network issue. ***

It looks like d_type can be DT_UNKNOWN on NFS.


Benchmark using scandir-6.patch
-------------------------------

I rerun benchmark with scandir-6.patch with more files to compare the two benchmarks.

[ C implementation ] Results of bench_scandir2.py on my Windows 7 VM using NTFS:

- 110,100 entries (100,000 files, 100 symlinks, 10,000 directories)
- bench: 14.0x faster than listdir (scandir: 399.0 ms, listdir: 5578.7 ms)
- bench_nostat: 0.3x faster than listdir (scandir: 279.2 ms, listdir: 76.1 ms)

[ C implementation ] Results of bench_scandir2.py on my laptop using NFS share and slow wifi:

- 11,010 entries (10,000 files, 10 symlinks, 1,000 directories)
- bench: 1.5x faster than listdir (scandir: 14.8 ms, listdir: 21.4 ms)
- bench_nostat: 0.2x faster than listdir (scandir: 10.6 ms, listdir: 2.2 ms)
msg235884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-13 09:28
Hi Ben,

2015-02-13 4:51 GMT+01:00 Ben Hoyt <report@bugs.python.org>:
> Hi Victor, I thank you for your efforts here, especially your addition of DirEntry.inode() and your work on the tests.

The addition of inode() should still be discussed on python-dev. The
remaining question is if the inode number (st_ino) is useful even if
st_dev is missing. I didn't added the inode() method yet to the PEP
because of that.

> However, I'm a bit frustrated that you just re-implemented the whole thing without discussion:
(...)
> So it was a bit of a let down for a first-time Python contributor. Even a simple question would have been helpful here: "Ben, I really think this much C code for a first version is bug-prone; what do you think about me re-implementing it and comparing?"

I'm really sorry, I didn't want to frustrate you :-( Sorry for not
being available, I also expected someone else to review your code, but
it didn't happen. And I *want* your opinon on design choices. Sorry if
I wasn't explicit about that in my last emails.

Sorry, I was very busy last months on other projects (like asyncio). I
want scandir() to be included in Python 3.5! The3.5 alpha 1 release
was a reminder for me.

I didn't reimplement everything. Almost all code comes from your work.
I just adapted it to Python 3.5.

> I've been behind scandir and written the first couple of implementations, and I asked if you could review my code, but instead of reviewing it or interacting with my fairly clear desire for the all-C version, you simply turn up and say "I've rewritten it, can you please review?"

I'm not happy of the benchmark results of the C+Python implementation
(scandir-6.patch is supposed to be the fasted implementation of
C+Python).

I tried to produce as much benchmark results as possible to be able to
take a decision. I used:

* hardware: SSD, HDD and tmpfs (RAM)
* file system: ext4, tmpfs, NFS/ext4
* operating systems: Linux (Fedora 21), Windows 7

I will try to summarize benchmark results to write an email to
python-dev, to make a choice between the full C implementation vs
C+Python implementation. (ctypes is not an option, it's not reliable,
portability matters.)

> To continue the actual "which implementation" discussion: as I mentioned last week in http://bugs.python.org/msg235458, I think the benchmarks above show pretty clearly we should use the all-C version.

It's quite clear that the C implementation is always faster than C+Python.

My point is that we have to make a choice, C+Python is a nice
compromise because it uses less C code, and C code is more expensive
to *maintain*.

> For background: PEP 471 doesn't add any new functionality, and especially with the new pathlib module, it doesn't make directory iteration syntax nicer either: os.scandir() is all about letting the OS give you whatever info it can *for performance*. Most of the Rationale for adding scandir given in PEP 471 is because it can be so so much faster than listdir + stat.

Good point :-)

In many setup (hardware, operating system, file system), I see a low
speedup (less than 2x faster). The whole purpose of the PEP 471
becomes unclear if the speedup is not at least 2x.

Quicky summary of benchmarks:

* using C+Python (scandir-6.patch), the benchmark ("bench") is only
faster than 2x on Windows, in the other cases it's between 1.3x and
1.4x faster
* using C (scandir-2.patch), the benchmark ("bench") is at least 3.5x
faster, it's between 3.5x and 44.6x faster, the most interesting
speedup is on Windows (44.6x faster!)
msg235888 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-13 11:37
> Result of bench (cached): scandir-6.patch => scandir-2.patch
> (...)
> laptop using SSD and ext4: 1.3x faster => 2.8x faster
> desktop PC using HDD and ext4: 1.4x faster => 1.4x faster

Oops, I copied the wrong numbers. scandir-2.patch is faster than that!

* laptop using SSD and ext4: 1.3x faster => 3.6x faster
* desktop PC using HDD and ext4: 1.4x faster => 3.5x faster
msg235920 - (view) Author: Akira Li (akira) * Date: 2015-02-13 21:58
As I've mentioned in http://bugs.python.org/issue22524#msg231703

  os.walk size 7925376343, scandir.walk size 5534939617 -- NOT EQUAL!

os.walk and scandir.walk do a different work here.

I don't see that it is acknowledged so I assume the benchmark is not fixed.

If the fixed (same work) benchmark is used then there is no performance difference on Linux. See the attached file http://bugs.python.org/file37292/get_tree_size_listdir.diff 

Note: the performance improvements on Windows along should be enough to accept os.scandir() even if Linux version won't be improved (if the behavior is documented).
msg235940 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-14 01:26
Akari: yes, I'm aware of this, and it's definitely a concern to me -- it may be that scandir has a bug with counting the size of certain non-file directory entries, or my implementation of os.walk() via scandir is not quite correct. I'll look at it before too long!
msg236205 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-19 01:55
BTW, I replied to Victor privately, but for the thread: no worries, and apology accepted! :-) I'm working on updating my C patch with his feedback, and will update this issue hopefully in the next few days.
msg236738 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-02-27 11:45
Okay, here's the next version of the patch. I've updated scandir-2.patch with a number of modifications and several fixes to the C code.

This includes Victor's scandir-6.patch test suite (with minor mods) and my original documentation. Note that I haven't looked at either the documentation or the tests too closely, but I'm uploading the patch in any case so Victor (or others) can start reviewing the C code.
msg237328 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-03-06 04:53
Updated version of the patch after Victor's code review of scandir-7 and a couple of his comments offline. Full commit log is available on my GitHub project in posixmodule_scandir_main.c if anyone's interested. Again, I haven't looked closely at the docs or tests yet, but hopefully the C code is getting pretty close now!

KNOWN ISSUE: There's a reference leak in the POSIX version (found with "python -m test -R 3:2 test_os"). I'm a Windows guy and this is my first serious Python C extension code, so I'm not sure the best way to track this down. Would someone like to either point out the bug :-) or help me with how to track this down?
msg237446 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-03-07 13:25
Oops, I'm sorry re previous comment -- looks like I forgot to attach scandir-8.patch. Now attached. Please re-read my previous comment and review. :-)
msg237489 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-08 01:09
New changeset d04d5b9c29f6 by Victor Stinner in branch 'default':
Issue #22524: New os.scandir() function, part of the PEP 471: "os.scandir()
https://hg.python.org/cpython/rev/d04d5b9c29f6
msg237490 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-08 01:09
> KNOWN ISSUE: There's a reference leak in the POSIX version (found with "python -m test -R 3:2 test_os").

Don't worry, it was a simple refleak, I fixed it:

diff -r 392d3214fc23 Modules/posixmodule.c
--- a/Modules/posixmodule.c     Sun Mar 08 01:58:04 2015 +0100
+++ b/Modules/posixmodule.c     Sun Mar 08 02:08:05 2015 +0100
@@ -16442,6 +16442,7 @@ DirEntry_fetch_stat(DirEntry *self, int 
         result = STAT(path, &st);
     else
         result = LSTAT(path, &st);
+    Py_DECREF(bytes);
 
     if (result != 0)
         return PyErr_SetFromErrnoWithFilenameObject(PyExc_OSError, self->path);
msg237495 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-08 01:55
I commited  scandir-8.patch with the documentation of scandir-5.patch. I wanted os.scandir() to be part of Python 3.5 alpha 2.

Thanks for your patience Ben ;-)

We should now watch buildbots to check how they appreciate os.scandir().
http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable

Right now, test_os fails on Windows buildbots because of a regression of #23524. test_os.TestScandir succeeded on AMD64 Windows7 SP1 3.x.

We can maybe keep the issue open if someone still wants to enhance the C code, tests or the doc.
msg237496 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-08 02:01
New changeset 60e5c34ec53a by Victor Stinner in branch 'default':
Issue #22524: Fix os.scandir() for platforms which don't have a d_type field in
https://hg.python.org/cpython/rev/60e5c34ec53a
msg237497 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-08 02:02
Oh oh, OpenIndiana doesn't support d_type: the dirent structure has no d_type field and DT_xxx constants like DT_UNKNOWN are not defined.

gcc -Wsign-compare -g -O0 -Wall -Wstrict-prototypes -I/usr/local/include/ncursesw -m64   -Werror=declaration-after-statement   -I. -IInclude -I./Include    -DPy_BUILD_CORE  -c ./Modules/_collectionsmodule.c -o Modules/_collectionsmodule.o
./Modules/posixmodule.c: In function 'DirEntry_is_symlink':
./Modules/posixmodule.c:16393: error: 'DT_UNKNOWN' undeclared (first use in this function)
./Modules/posixmodule.c:16393: error: (Each undeclared identifier is reported only once
./Modules/posixmodule.c:16393: error: for each function it appears in.)
./Modules/posixmodule.c:16394: error: 'DT_LNK' undeclared (first use in this function)
./Modules/posixmodule.c:16398: warning: control reaches end of non-void function
./Modules/posixmodule.c: In function 'DirEntry_test_mode':
./Modules/posixmodule.c:16519: error: 'DT_LNK' undeclared (first use in this function)
./Modules/posixmodule.c:16520: error: 'DT_UNKNOWN' undeclared (first use in this function)
./Modules/posixmodule.c:16559: error: 'DT_DIR' undeclared (first use in this function)
./Modules/posixmodule.c:16561: error: 'DT_REG' undeclared (first use in this function)
./Modules/posixmodule.c: In function 'ScandirIterator_iternext':
./Modules/posixmodule.c:16973: error: 'struct dirent' has no member named 'd_type'
make: *** [Modules/posixmodule.o] Error 1
make: *** Waiting for unfinished jobs....
msg237505 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-08 02:36
I opened the issue #23605: "Use the new os.scandir() function in os.walk()".
msg237624 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-09 09:52
It looks like test_os now pass on all buildbots, including OpenIndiana. I close the issue.
msg237627 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-09 09:57
FYI os.scandir() is part of Python 3.5 alpha 2 which is now available
(including installers for Windows):
https://www.python.org/downloads/release/python-350a2/
msg237753 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-03-10 11:36
Hi Victor. I'm attaching a patch to the What's New doc. I know it's a minor thing, but I really read the What's New in Python x.y page whenever it comes out, so making this as specific as possible is good. The changes are:

1. Be more specific and flesh it out slightly: for example, instead of just saying "PEP 471 adds a new directory iteration function", say very briefly why it was added (to speed up os.walk) and what it achieves.
2. Move "PEP and implementation by..." under correct correct PEP -- it's currently under PEP 475.
3. Be more specific (as per point 1) under os.scandir(), and link to issue
4. Add a note about the os.walk() performance improvement under Optimizations. I realize the changes to os.walk() haven't been committed yet, but hopefully they will be quite soon.

Any feedback? Are you happy to commit these changes?

I intend to do some review of the scandir/DirEntry docs as well. I'll send those in the next few days.
msg237758 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-10 12:29
New changeset 8d76dabd40f6 by Victor Stinner in branch 'default':
Issue #22524: Rephrase scandir addition in What's New in Python 3.5
https://hg.python.org/cpython/rev/8d76dabd40f6
msg237761 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-10 12:40
2015-03-10 12:36 GMT+01:00 Ben Hoyt <report@bugs.python.org>:
> I intend to do some review of the scandir/DirEntry docs as well. I'll send those in the next few days.

Open maybe a new issue if you want to enhance the doc.
msg237765 - (view) Author: Ben Hoyt (benhoyt) * Date: 2015-03-10 13:24
Thanks. Will do!
History
Date User Action Args
2015-03-10 13:24:41benhoytsetmessages: + msg237765
2015-03-10 12:40:11vstinnersetmessages: + msg237761
2015-03-10 12:29:53python-devsetmessages: + msg237758
2015-03-10 11:36:49benhoytsetfiles: + whatsnew.patch

messages: + msg237753
2015-03-09 09:57:03vstinnersetmessages: + msg237627
2015-03-09 09:52:18vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg237624
2015-03-08 02:36:32vstinnersetmessages: + msg237505
2015-03-08 02:02:15vstinnersetmessages: + msg237497
2015-03-08 02:01:28python-devsetmessages: + msg237496
2015-03-08 01:55:45vstinnersetmessages: + msg237495
2015-03-08 01:09:57vstinnersetmessages: + msg237490
2015-03-08 01:09:38python-devsetnosy: + python-dev
messages: + msg237489
2015-03-07 13:25:35benhoytsetfiles: + scandir-8.patch

messages: + msg237446
2015-03-06 04:53:49benhoytsetmessages: + msg237328
2015-02-27 11:45:40benhoytsetfiles: + scandir-7.patch

messages: + msg236738
2015-02-19 01:55:36benhoytsetmessages: + msg236205
2015-02-14 01:26:08benhoytsetmessages: + msg235940
2015-02-13 21:58:25akirasetmessages: + msg235920
2015-02-13 11:37:34vstinnersetmessages: + msg235888
2015-02-13 09:53:09alexei.romanovsetnosy: + alexei.romanov
2015-02-13 09:28:29vstinnersetmessages: + msg235884
2015-02-13 09:08:49vstinnersetmessages: + msg235883
2015-02-13 04:39:25benhoytsetmessages: + msg235874
2015-02-13 03:51:47benhoytsetmessages: + msg235873
2015-02-13 00:38:24vstinnersetfiles: + scandir-6.patch

messages: + msg235865
2015-02-12 23:41:10vstinnersetfiles: + bench_scandir2.py
2015-02-12 22:19:54vstinnersetmessages: + msg235860
2015-02-12 22:09:04vstinnersetmessages: + msg235859
2015-02-12 18:00:54vstinnersetfiles: + bench_scandir.py

messages: + msg235849
2015-02-12 17:25:51vstinnersetfiles: + scandir-5.patch

messages: + msg235848
2015-02-12 15:20:42vstinnersetfiles: - scandir-4.patch
2015-02-12 15:20:30vstinnersetfiles: + scandir-4.patch

messages: + msg235835
2015-02-12 14:57:09vstinnersetfiles: + scandir-4.patch

messages: + msg235833
2015-02-12 11:16:01vstinnersetfiles: + scandir-3.patch

messages: + msg235806
2015-02-06 01:37:17benhoytsetmessages: + msg235458
2014-11-27 04:31:21akirasetfiles: - get_tree_size_listdir.diff
2014-11-27 04:31:13akirasetfiles: + get_tree_size_listdir.diff

messages: + msg231745
2014-11-26 10:59:02akirasetfiles: + get_tree_size_listdir.diff

messages: + msg231703
2014-11-25 20:54:01benhoytsetmessages: + msg231684
2014-11-25 20:51:23vstinnersetmessages: + msg231682
2014-11-25 18:31:52akirasetmessages: + msg231677
2014-11-25 17:18:56tebekasetnosy: + tebeka
messages: + msg231673
2014-10-18 20:53:24benhoytsetfiles: + scandir-2.patch

messages: + msg229659
2014-10-09 22:40:44josh.rsetnosy: + josh.r
2014-10-09 13:02:35benhoytsetmessages: + msg228873
2014-10-09 12:59:37vstinnersetmessages: + msg228872
2014-10-09 12:38:30pitrousetmessages: + msg228867
2014-10-09 12:35:41benhoytsetmessages: + msg228866
2014-10-09 11:24:45vstinnersetfiles: + do_os_walk_getsize.patch

messages: + msg228858
2014-10-09 11:16:25vstinnersetmessages: + msg228857
2014-10-09 10:56:59vstinnersetfiles: + clear_system_cache.patch

messages: + msg228855
2014-10-09 10:56:16vstinnersetmessages: + msg228854
2014-10-08 23:03:51pitrousetmessages: + msg228821
2014-10-08 22:16:11pitrousetmessages: + msg228819
2014-10-08 21:55:08benhoytsetmessages: + msg228818
2014-10-08 12:26:44benhoytsetmessages: + msg228786
2014-10-08 10:24:08vstinnersetmessages: + msg228784
2014-10-08 09:14:01pitrousetmessages: + msg228780
2014-10-07 21:33:35vstinnersetmessages: + msg228775
2014-10-07 02:38:43akirasetnosy: + akira
2014-10-07 01:21:14benhoytsetfiles: + scandir-1.patch
keywords: + patch
messages: + msg228751
2014-09-30 15:19:10abacabadabacabasetnosy: + abacabadabacaba
2014-09-30 14:53:57giampaolo.rodolasetnosy: + giampaolo.rodola
2014-09-30 14:47:15ncoghlanlinkissue11406 superseder
2014-09-30 14:42:42benhoytcreate