Message 188298 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	benhoyt
Recipients	Trundle, benhoyt, brian.curtin, christian.heimes, eric.araujo, giampaolo.rodola, gregory.p.smith, loewis, ncoghlan, neologix, nvetoshkin, pitrou, rhettinger, serhiy.storchaka, socketpair, terry.reedy, tim.golden, torsten, twouters, vstinner
Date	2013-05-03.10:06:04
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAL9jXCE4hmPPoydr=ADVNE+RZ6gbJ8qxdUNnaumdgok7yZdOWQ@mail.gmail.com>
In-reply-to	<CAH_1eM1f70xAakhRMcdMVVZGRBKGai_Q5NcqF0vZuHdRZc=WhA@mail.gmail.com>

Content
Thanks. I thought about that -- but I think I want to benchmark it when they're cached, so that we're comparing apples with apples, cached system calls with cached systems calls. The benchmark would almost certainly be a lot "better" (BetterWalk would be even faster) if I was comparing the non-cached results. I'll think about it some more though. Thoughts? -Ben On Fri, May 3, 2013 at 7:03 PM, Charles-François Natali < report@bugs.python.org> wrote: > > Charles-François Natali added the comment: > > > However, the reason I'm keen on iterdir_stat() is that I'm seeing it > speed up os.walk() by a factor of 10 in my recent tests (note that I've > made local mods, so these results aren't reproducible for others yet). This > is doing a walk on a dir tree with 7800 files and 155 dirs: > > > > Using fast _betterwalk > > Priming the system's cache... > > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3... > > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3... > > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3... > > os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast > > > > Sometimes Windows will go into this "I'm really caching stat results > good" mode -- I don't know what heuristic determines this -- and then I'm > seeing a 40x speed increase. And no, you didn't read that wrong. :-) > > I/O benchmarks shouldn't use timeit or repeated calls: after the first > run, most of your data is in cache, so subsequent runs are > meaningless. > > I don't know about Windows, but on Linux you should do something like: > # echo 3 > /proc/sys/vm/drop_caches > > to start out clean. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue11406> > _______________________________________ >

Thanks. I thought about that -- but I think I *want* to benchmark it when
they're cached, so that we're comparing apples with apples, cached system
calls with cached systems calls. The benchmark would almost certainly be a
lot "better" (BetterWalk would be even faster) if I was comparing the
non-cached results. I'll think about it some more though.

Thoughts?

-Ben

On Fri, May 3, 2013 at 7:03 PM, Charles-François Natali <
report@bugs.python.org> wrote:

>
> Charles-François Natali added the comment:
>
> > However, the reason I'm keen on iterdir_stat() is that I'm seeing it
> speed up os.walk() by a factor of 10 in my recent tests (note that I've
> made local mods, so these results aren't reproducible for others yet). This
> is doing a walk on a dir tree with 7800 files and 155 dirs:
> >
> > Using fast _betterwalk
> > Priming the system's cache...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 1/3...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 2/3...
> > Benchmarking walks on C:\Work\betterwalk\benchtree, repeat 3/3...
> > os.walk took 0.178s, BetterWalk took 0.017s -- 10.5x as fast
> >
> > Sometimes Windows will go into this "I'm really caching stat results
> good" mode -- I don't know what heuristic determines this -- and then I'm
> seeing a 40x speed increase. And no, you didn't read that wrong. :-)
>
> I/O benchmarks shouldn't use timeit or repeated calls: after the first
> run, most of your data is in cache, so subsequent runs are
> meaningless.
>
> I don't know about Windows, but on Linux you should do something like:
> # echo 3 > /proc/sys/vm/drop_caches
>
> to start out clean.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue11406>
> _______________________________________
>

History
Date	User	Action	Args
2013-05-03 10:06:04	benhoyt	set	recipients: + benhoyt, loewis, twouters, rhettinger, terry.reedy, gregory.p.smith, ncoghlan, pitrou, vstinner, giampaolo.rodola, christian.heimes, tim.golden, eric.araujo, Trundle, brian.curtin, torsten, nvetoshkin, neologix, socketpair, serhiy.storchaka
2013-05-03 10:06:04	benhoyt	link	issue11406 messages
2013-05-03 10:06:04	benhoyt	create