classification
Title: Optimize parsing of regular expressions
Type: performance Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, haypo, josh.r, mrabarnett, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2013-10-24 20:14 by serhiy.storchaka, last changed 2014-10-10 08:46 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
re_parse.patch serhiy.storchaka, 2013-10-24 20:14 review
re_parse_2.patch serhiy.storchaka, 2013-10-24 21:30 review
re_parse_3.patch serhiy.storchaka, 2014-09-18 09:11 review
re_parse_4.patch serhiy.storchaka, 2014-10-05 18:04 review
re_parse_5.patch serhiy.storchaka, 2014-10-09 08:23 review
Messages (13)
msg201177 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-24 20:14
Proposed patch optimizes parsing of regular expressions. Total time of re unittests decreased by 10%.
msg201183 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-24 20:27
I don't think "+=" speeds up anything for ints, you might as well minimize code churn by avoiding such changes.
msg201191 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-24 21:30
Done.
msg201192 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-24 21:40
Do you have any benchmark figures (apart from the time of re unittests)?
msg201227 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-25 10:20
### regex_compile ###
Min: 2.897919 -> 2.577488: 1.12x faster
Avg: 3.066306 -> 2.681966: 1.14x faster
Significant (t=26.77)
Stddev: 0.08789 -> 0.05085: 1.7283x smaller
msg206557 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 22:02
Could someone please make a review?
msg227032 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 09:11
Actually "if x:" is slightly faster than "if x is not None:" on current implementation.
msg227041 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-18 13:23
"is not None" is more readable, though. When using plain boolean testing, it's never obvious whether you can have a zero-length string, a null number, etc.
msg227053 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 16:54
Well, then please look at re_parse_2.patch (it is still applied cleanly).
msg228605 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-05 18:04
Here is a patch which addresses Yury's and Josh's comments. Also discarded few minor changes.
msg228838 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-09 08:23
Updated patch implements Antoine's suggestions.
msg228964 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-10 08:16
New changeset 1adeac2a8714 by Serhiy Storchaka in branch 'default':
Issue #19380: Optimized parsing of regular expressions.
https://hg.python.org/cpython/rev/1adeac2a8714
msg228971 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-10 08:46
Thank you for your reviews Yury, Josh, and Antoine.
History
Date User Action Args
2014-10-10 08:46:47serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg228971

stage: patch review -> resolved
2014-10-10 08:16:35python-devsetnosy: + python-dev
messages: + msg228964
2014-10-09 08:23:36serhiy.storchakasetfiles: + re_parse_5.patch

messages: + msg228838
2014-10-05 18:04:46serhiy.storchakasetfiles: + re_parse_4.patch

messages: + msg228605
2014-09-18 23:17:49josh.rsetnosy: + josh.r
2014-09-18 16:54:39serhiy.storchakasetmessages: + msg227053
2014-09-18 13:23:28pitrousetmessages: + msg227041
2014-09-18 09:11:40serhiy.storchakasetfiles: + re_parse_3.patch

messages: + msg227032
2014-08-01 09:35:22serhiy.storchakasetkeywords: + needs review
versions: + Python 3.5, - Python 3.4
2013-12-21 14:28:28hayposetnosy: + haypo
2013-12-18 22:02:08serhiy.storchakasetmessages: + msg206557
2013-10-25 10:20:20serhiy.storchakasetmessages: + msg201227
2013-10-24 21:40:51pitrousetmessages: + msg201192
2013-10-24 21:30:12serhiy.storchakasetfiles: + re_parse_2.patch

messages: + msg201191
2013-10-24 20:27:27pitrousetnosy: + pitrou
messages: + msg201183
2013-10-24 20:14:19serhiy.storchakacreate