Message 274182 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steve.newcomb
Recipients	rhettinger, serhiy.storchaka, steve.newcomb
Date	2016-09-01.20:53:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<25e15524-06ae-5739-bdb0-7dc7ae77371c@coolheads.com>
In-reply-to	<1472575618.81.0.730978536893.issue27898@psf.upfronthosting.co.za>

Content
On 08/30/2016 12:46 PM, Raymond Hettinger wrote: > Raymond Hettinger added the comment: > > It would be helpful if you ... make a small set of regular expressions that demonstrate the performance regression. > Done. Attachments: test.py : Code that exercises re.sub() and outputs a profile report. test_output_2.7.6.txt : Output of test.py under Python 2.7.6. test_output_2.7.12.txt : Output of test.py under Python 2.7.12. p17.188.htm -- test data: public information from the U.S. Internal Revenue Service. Equivalent hardware was used in both cases. The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 2.7.6's. It's a significant difference, but... ...it was not the dramatic degradation I expected to find in this exercise. Therefore I attempted to tease what I was looking for out of the profile stats I already uploaded to this site, made from actual production runs. My attempts are all found in an hg repository that can be downloaded from sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password bysIe20H . I do not feel the latter work took me where I wanted to go, and I think the reason is that, at least for purposes of our application, Python 2.7.12 has been so extensively refactored since Python 2.7.6. So it's an apples-to-oranges comparison, apparently. Still, the performance difference for re.sub() is quite dramatic , and re.sub() is the only comparable function whose performance dramatically worsened: in our application, 2.7.12's re.sub() takes 3.04 times as long as 2.7.6's. The good news, of course, is that by and large the performance of the other comparable functions largely improved, often dramatically. But at least in our application, it doesn't come close to making up for the degradation in re.sub(). My by-the-gut bottom line: somebody who really knows the re module should take a deep look at re.sub(). Why would re.sub(), unlike all others, take so much longer to run, while every other function in the re module get (often much) faster? It feels like there's a bug somewhere in re.sub(). Steve Newcomb

On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
> Raymond Hettinger added the comment:
>
> It would be helpful if you ... make a small set of regular expressions that demonstrate the performance regression.
>
Done.  Attachments:

test.py : Code that exercises re.sub() and outputs a profile report.

test_output_2.7.6.txt : Output of test.py under Python 2.7.6.

test_output_2.7.12.txt : Output of test.py under Python 2.7.12.

p17.188.htm -- test data: public information from the U.S. Internal 
Revenue Service.

Equivalent hardware was used in both cases.

The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
2.7.6's.  It's a significant difference, but...

...it was not the dramatic degradation I expected to find in this 
exercise.  Therefore I attempted to tease what I was looking for out of 
the profile stats I already uploaded to this site, made from actual 
production runs.  My attempts are all found in an hg repository that can 
be downloaded from 
sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password 
bysIe20H .

I do not feel the latter work took me where I wanted to go, and I think 
the reason is that, at least for purposes of our application, Python 
2.7.12 has been so extensively refactored since Python 2.7.6.  So it's 
an apples-to-oranges comparison, apparently.  Still, the performance 
difference for re.sub() is quite dramatic , and re.sub() is the only 
comparable function whose performance dramatically worsened: in our 
application, 2.7.12's re.sub() takes 3.04 times as long as 2.7.6's.

The good news, of course, is that by and large the performance of the 
other *comparable* functions largely improved, often dramatically.  But 
at least in our application, it doesn't come close to making up for the 
degradation in re.sub().

My by-the-gut bottom line: somebody who really knows the re module 
should take a deep look at re.sub().  Why would re.sub(), unlike all 
others, take so much longer to run, while *every* other function in the 
re module get (often much) faster?  It feels like there's a bug 
somewhere in re.sub().

Steve Newcomb

Files
File name	Uploaded
p17-188.htm	steve.newcomb, 2016-09-01.20:52:59
test.py	steve.newcomb, 2016-09-01.20:52:59
test_output_2.7.12.txt	steve.newcomb, 2016-09-01.20:53:00
test_output_2.7.6.txt	steve.newcomb, 2016-09-01.20:52:59

History
Date	User	Action	Args
2016-09-01 20:53:04	steve.newcomb	set	recipients: + steve.newcomb, rhettinger, serhiy.storchaka
2016-09-01 20:53:00	steve.newcomb	link	issue27898 messages
2016-09-01 20:53:00	steve.newcomb	create