This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steve.newcomb
Recipients rhettinger, serhiy.storchaka, steve.newcomb
Date 2016-09-01.20:53:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <25e15524-06ae-5739-bdb0-7dc7ae77371c@coolheads.com>
In-reply-to <1472575618.81.0.730978536893.issue27898@psf.upfronthosting.co.za>
Content
On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
> Raymond Hettinger added the comment:
>
> It would be helpful if you ... make a small set of regular expressions that demonstrate the performance regression.
>
Done.  Attachments:

test.py : Code that exercises re.sub() and outputs a profile report.

test_output_2.7.6.txt : Output of test.py under Python 2.7.6.

test_output_2.7.12.txt : Output of test.py under Python 2.7.12.

p17.188.htm -- test data: public information from the U.S. Internal 
Revenue Service.

Equivalent hardware was used in both cases.

The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
2.7.6's.  It's a significant difference, but...

...it was not the dramatic degradation I expected to find in this 
exercise.  Therefore I attempted to tease what I was looking for out of 
the profile stats I already uploaded to this site, made from actual 
production runs.  My attempts are all found in an hg repository that can 
be downloaded from 
sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password 
bysIe20H .

I do not feel the latter work took me where I wanted to go, and I think 
the reason is that, at least for purposes of our application, Python 
2.7.12 has been so extensively refactored since Python 2.7.6.  So it's 
an apples-to-oranges comparison, apparently.  Still, the performance 
difference for re.sub() is quite dramatic , and re.sub() is the only 
comparable function whose performance dramatically worsened: in our 
application, 2.7.12's re.sub() takes 3.04 times as long as 2.7.6's.

The good news, of course, is that by and large the performance of the 
other *comparable* functions largely improved, often dramatically.  But 
at least in our application, it doesn't come close to making up for the 
degradation in re.sub().

My by-the-gut bottom line: somebody who really knows the re module 
should take a deep look at re.sub().  Why would re.sub(), unlike all 
others, take so much longer to run, while *every* other function in the 
re module get (often much) faster?  It feels like there's a bug 
somewhere in re.sub().

Steve Newcomb
Files
File name Uploaded
p17-188.htm steve.newcomb, 2016-09-01.20:52:59
test.py steve.newcomb, 2016-09-01.20:52:59
test_output_2.7.12.txt steve.newcomb, 2016-09-01.20:53:00
test_output_2.7.6.txt steve.newcomb, 2016-09-01.20:52:59
History
Date User Action Args
2016-09-01 20:53:04steve.newcombsetrecipients: + steve.newcomb, rhettinger, serhiy.storchaka
2016-09-01 20:53:00steve.newcomblinkissue27898 messages
2016-09-01 20:53:00steve.newcombcreate