This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steve.newcomb
Recipients rhettinger, serhiy.storchaka, steve.newcomb
Date 2016-09-01.21:01:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <ade62137-4116-ac3c-a695-e3a3ea79aec4@coolheads.com>
In-reply-to <25e15524-06ae-5739-bdb0-7dc7ae77371c@coolheads.com>
Content
Oops.  The correct url is sftp://coolheads.com/files/py-re-perform-276v2712/

On 09/01/2016 04:52 PM, Steve Newcomb wrote:
> On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
>> Raymond Hettinger added the comment:
>>
>> It would be helpful if you ... make a small set of regular 
>> expressions that demonstrate the performance regression.
>>
> Done.  Attachments:
>
> test.py : Code that exercises re.sub() and outputs a profile report.
>
> test_output_2.7.6.txt : Output of test.py under Python 2.7.6.
>
> test_output_2.7.12.txt : Output of test.py under Python 2.7.12.
>
> p17.188.htm -- test data: public information from the U.S. Internal 
> Revenue Service.
>
> Equivalent hardware was used in both cases.
>
> The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
> 2.7.6's.  It's a significant difference, but...
>
> ...it was not the dramatic degradation I expected to find in this 
> exercise.  Therefore I attempted to tease what I was looking for out 
> of the profile stats I already uploaded to this site, made from actual 
> production runs.  My attempts are all found in an hg repository that 
> can be downloaded from 
> sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password 
> bysIe20H .
>
> I do not feel the latter work took me where I wanted to go, and I 
> think the reason is that, at least for purposes of our application, 
> Python 2.7.12 has been so extensively refactored since Python 2.7.6.  
> So it's an apples-to-oranges comparison, apparently.  Still, the 
> performance difference for re.sub() is quite dramatic , and re.sub() 
> is the only comparable function whose performance dramatically 
> worsened: in our application, 2.7.12's re.sub() takes 3.04 times as 
> long as 2.7.6's.
>
> The good news, of course, is that by and large the performance of the 
> other *comparable* functions largely improved, often dramatically.  
> But at least in our application, it doesn't come close to making up 
> for the degradation in re.sub().
>
> My by-the-gut bottom line: somebody who really knows the re module 
> should take a deep look at re.sub().  Why would re.sub(), unlike all 
> others, take so much longer to run, while *every* other function in 
> the re module get (often much) faster?  It feels like there's a bug 
> somewhere in re.sub().
>
> Steve Newcomb
>
History
Date User Action Args
2016-09-01 21:01:26steve.newcombsetrecipients: + steve.newcomb, rhettinger, serhiy.storchaka
2016-09-01 21:01:26steve.newcomblinkissue27898 messages
2016-09-01 21:01:25steve.newcombcreate