Author sjmachin
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, doerwalter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2009-08-12.03:00:19
SpamBayes Score 6.66885e-08
Marked as misclassified No
Message-id <1250046022.36.0.0616708172034.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
What is the expected timing comparison with re? Running the Aug10#3
version on Win XP SP3 with Python 2.6.3, I see regex typically running
at only 20% to %50 of the speed of re in ASCII mode, with
not-very-atypical tests (find all Python identifiers in a line, failing
search for a Python identifier in an 80-byte text). Is the supplied
_regex.pyd from some sort of debug or unoptimised build? Here are some
results:

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='    def __init__(self, arg1,
arg2):\n'" "r.findall(t)"
100000 loops, best of 3: 5.32 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='    def __init__(self, arg1,
arg2):\n'" "r.findall(t)"
100000 loops, best of 3: 12.2 usec per loop

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8" "r.search(t)"
1000000 loops, best of 3: 1.61 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8" "r.search(t)"
100000 loops, best of 3: 7.62 usec per loop

Here's the worst case that I've found so far:

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'z{80}');t='z'*79" "r.search(t)"
1000000 loops, best of 3: 1.19 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'z{80}');t='z'*79" "r.search(t)"
1000 loops, best of 3: 334 usec per loop

See Friedl: "length cognizance". Corresponding figures for match() are
1.11 and 8.5.
History
Date User Action Args
2009-08-12 03:00:22sjmachinsetrecipients: + sjmachin, loewis, akuchling, doerwalter, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, vbr, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray
2009-08-12 03:00:22sjmachinsetmessageid: <1250046022.36.0.0616708172034.issue2636@psf.upfronthosting.co.za>
2009-08-12 03:00:20sjmachinlinkissue2636 messages
2009-08-12 03:00:19sjmachincreate