This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author timehorse
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, doerwalter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2009-08-12.12:04:08
SpamBayes Score 1.7985613e-14
Marked as misclassified No
Message-id <1250078650.91.0.50242524046.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
</lurk>
Re: timings

Thanks for the info, John.  First of all, I really like those tests and
could you please submit a patch or other document so that we could
combine them into the python test suite.

The python test suite, which can be run as part of 'make test' or IIRC
there is a way to run JUST the 2 re test suites which I seem to have
senior moment'd, includes a built-in timing output over some of the
tests, though I don't recall which ones were being timed: standard cases
or pathological (rare) ones.  Either way, we should include some timings
that are of a standard nature in the test suite to make Matthew's and
any other developer's work easier.

So, John, if you are not familiar with the test suite, I can look into
adding the specific cases you've developed into the test suite so we can
have a more representative timing of things.  Remember, though, that
when run as a single instance, at least in the existing engine, the re
compiler caches recent compiles, so repeatedly compiling an expression
flattens the overhead in a single run to a single compile and lookup,
where as your tests recompile at each test (though I'm not sure what
timeit is doing: if it invokes a new instance of python each time, it is
recompiling each time, if it is reusing the instance, it is only
compiling once).

Having not looked at Matthew's regex code recently (nice name, BTW), I
don't know if it also contains the compiled expression cache, in which
case, adding it in might help timings.  Originally, the cache worked by
storing ~100 entries and cleared itself when full; I have a modification
which increases this to 256 (IIRC) and only removes the 128 oldest to
prevent thrashing at the boundary which I think is better if only for a
particular pathological case.

In any case, don't despair at these numbers, Matthew: you have a lot of
time and potentially a lot of ways to make your engine faster by the
time 1.7 alpha is coined.  But also be forewarned, because, knowing what
I know about the current re engine and what it is further capable of, I
don't think your regex will be replacing re in 1.7 if it isn't at least
as fast as the existing engine for some standard set of agreed upon
tests, no matter how many features you can add.  I have no doubt, with a
little extra monkey grease, we could implement all new features in the
existing engine.  I don't want to have to reinvent the wheel, of course,
and if Matthew's engine can pick up some speed everybody wins!  So, keep
up the good work Matthew, as it's greatly appreciated!

Thanks all!

Jeffrey.

<lurk>
History
Date User Action Args
2009-08-12 12:04:11timehorsesetrecipients: + timehorse, loewis, akuchling, doerwalter, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, rsc, mark, vbr, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray
2009-08-12 12:04:10timehorsesetmessageid: <1250078650.91.0.50242524046.issue2636@psf.upfronthosting.co.za>
2009-08-12 12:04:09timehorselinkissue2636 messages
2009-08-12 12:04:08timehorsecreate