This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjmachin
Recipients akitada, akuchling, amaury.forgeotdarc, collinwinter, doerwalter, ezio.melotti, georg.brandl, gregory.p.smith, jaylogan, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2009-08-12.03:00:19
SpamBayes Score 6.668845e-08
Marked as misclassified No
Message-id <1250046022.36.0.0616708172034.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
What is the expected timing comparison with re? Running the Aug10#3
version on Win XP SP3 with Python 2.6.3, I see regex typically running
at only 20% to %50 of the speed of re in ASCII mode, with
not-very-atypical tests (find all Python identifiers in a line, failing
search for a Python identifier in an 80-byte text). Is the supplied
_regex.pyd from some sort of debug or unoptimised build? Here are some
results:

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='    def __init__(self, arg1,
arg2):\n'" "r.findall(t)"
100000 loops, best of 3: 5.32 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='    def __init__(self, arg1,
arg2):\n'" "r.findall(t)"
100000 loops, best of 3: 12.2 usec per loop

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8" "r.search(t)"
1000000 loops, best of 3: 1.61 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'[A-Za-z_][A-Za-z0-9_]+');t='1234567890'*8" "r.search(t)"
100000 loops, best of 3: 7.62 usec per loop

Here's the worst case that I've found so far:

dos-prompt>\python26\python -mtimeit -s"import re as
x;r=x.compile(r'z{80}');t='z'*79" "r.search(t)"
1000000 loops, best of 3: 1.19 usec per loop

dos-prompt>\python26\python -mtimeit -s"import regex as
x;r=x.compile(r'z{80}');t='z'*79" "r.search(t)"
1000 loops, best of 3: 334 usec per loop

See Friedl: "length cognizance". Corresponding figures for match() are
1.11 and 8.5.
History
Date User Action Args
2009-08-12 03:00:22sjmachinsetrecipients: + sjmachin, loewis, akuchling, doerwalter, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, amaury.forgeotdarc, pitrou, nneonneo, rsc, timehorse, mark, vbr, ezio.melotti, mrabarnett, jaylogan, akitada, moreati, r.david.murray
2009-08-12 03:00:22sjmachinsetmessageid: <1250046022.36.0.0616708172034.issue2636@psf.upfronthosting.co.za>
2009-08-12 03:00:20sjmachinlinkissue2636 messages
2009-08-12 03:00:19sjmachincreate