Message 152215 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Devin Jeanpierre
Recipients	Devin Jeanpierre, akitada, akoumjian, alex, amaury.forgeotdarc, belopolsky, davide.rizzo, eric.snow, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jacques, jaylogan, jhalcrow, jimjjewett, loewis, mark, mattchaput, moreati, mrabarnett, ncoghlan, nneonneo, pitrou, r.david.murray, ronnix, rsc, sjmachin, steven.daprano, stiv, timehorse, vbr, zdwiel
Date	2012-01-29.08:01:42
SpamBayes Score	3.579559e-11
Marked as misclassified	No
Message-id	<1327824104.2.0.639258920542.issue2636@psf.upfronthosting.co.za>
In-reply-to

Content
> In practice, I expect that a pure Python implementation of a regular expression engine would only be fast enough to be usable on PyPy. Not sure why this is necessarily true. I'd expect a pure-Python implementation to be maybe 200 times as slow. Many queries (those on relatively short strings that backtrack little) finish within microseconds. On this scale, a couple of orders of magnitudes is not noticeable by humans (unless it adds up), and even where it gets noticeable, it's better than having nothing at all or a non-working program (up until a point). python -m timeit -n 1000000 -s "import re; x = re.compile(r'.<\shelp\s>([^<])<\s/\shelp.>'); data = ' '1000 + '< help >' + 'abc'*100 + '</help>'" "x.match(data)" 1000000 loops, best of 3: 3.27 usec per loop

> In practice, I expect that a pure Python implementation of a regular expression engine would only be fast enough to be usable on PyPy.

Not sure why this is necessarily true. I'd expect a pure-Python implementation to be maybe 200 times as slow. Many queries (those on relatively short strings that backtrack little) finish within microseconds. On this scale, a couple of orders of magnitudes is not noticeable by humans (unless it adds up), and even where it gets noticeable, it's better than having nothing at all or a non-working program (up until a point).

python -m timeit -n 1000000 -s "import re; x = re.compile(r'.*<\s*help\s*>([^<]*)<\s*/\s*help.*>'); data = ' '*1000 + '< help >' + 'abc'*100 + '</help>'" "x.match(data)"
1000000 loops, best of 3: 3.27 usec per loop

History
Date	User	Action	Args
2012-01-29 08:01:44	Devin Jeanpierre	set	recipients: + Devin Jeanpierre, loewis, georg.brandl, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, ncoghlan, belopolsky, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, steven.daprano, alex, r.david.murray, jacques, zdwiel, jhalcrow, stiv, davide.rizzo, mattchaput, ronnix, eric.snow, akoumjian
2012-01-29 08:01:44	Devin Jeanpierre	set	messageid: <1327824104.2.0.639258920542.issue2636@psf.upfronthosting.co.za>
2012-01-29 08:01:43	Devin Jeanpierre	link	issue2636 messages
2012-01-29 08:01:43	Devin Jeanpierre	create