This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benhoyt
Recipients Trundle, benhoyt, brian.curtin, christian.heimes, eric.araujo, giampaolo.rodola, gregory.p.smith, loewis, ncoghlan, neologix, nvetoshkin, pitrou, rhettinger, serhiy.storchaka, socketpair, terry.reedy, tim.golden, torsten, twouters, vstinner
Date 2013-05-03.10:06:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAL9jXCE4hmPPoydr=ADVNE+RZ6gbJ8qxdUNnaumdgok7yZdOWQ@mail.gmail.com>
In-reply-to <CAH_1eM1f70xAakhRMcdMVVZGRBKGai_Q5NcqF0vZuHdRZc=WhA@mail.gmail.com>
Content
Thanks. I thought about that -- but I think I *want* to benchmark it when
they're cached, so that we're comparing apples with apples, cached system
calls with cached systems calls. The benchmark would almost certainly be a
lot "better" (BetterWalk would be even faster) if I was comparing the
non-cached results. I'll think about it some more though.

Thoughts?

-Ben

On Fri, May 3, 2013 at 7:03 PM, Charles-François Natali <
report@bugs.python.org> wrote:

>
> Charles-François Natali added the comment:
>
> > However, the reason I'm keen on iterdir_stat() is that I'm seeing it
> speed up os.walk() by a factor of 10 in my recent tests (note that I've
> made local mods, so these results aren't reproducible for others yet). This
> is doing a walk on a dir tree with 7800 files and 155 dirs:
> >
> > Using fast _betterwalk
> > Priming the system's cache...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
> > os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast
> >
> > Sometimes Windows will go into this "I'm really caching stat results
> good" mode -- I don't know what heuristic determines this -- and then I'm
> seeing a 40x speed increase. And no, you didn't read that wrong. :-)
>
> I/O benchmarks shouldn't use timeit or repeated calls: after the first
> run, most of your data is in cache, so subsequent runs are
> meaningless.
>
> I don't know about Windows, but on Linux you should do something like:
> # echo 3 > /proc/sys/vm/drop_caches
>
> to start out clean.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue11406>
> _______________________________________
>
History
Date User Action Args
2013-05-03 10:06:04benhoytsetrecipients: + benhoyt, loewis, twouters, rhettinger, terry.reedy, gregory.p.smith, ncoghlan, pitrou, vstinner, giampaolo.rodola, christian.heimes, tim.golden, eric.araujo, Trundle, brian.curtin, torsten, nvetoshkin, neologix, socketpair, serhiy.storchaka
2013-05-03 10:06:04benhoytlinkissue11406 messages
2013-05-03 10:06:04benhoytcreate