Message274183
Oops. The correct url is sftp://coolheads.com/files/py-re-perform-276v2712/
On 09/01/2016 04:52 PM, Steve Newcomb wrote:
> On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
>> Raymond Hettinger added the comment:
>>
>> It would be helpful if you ... make a small set of regular
>> expressions that demonstrate the performance regression.
>>
> Done. Attachments:
>
> test.py : Code that exercises re.sub() and outputs a profile report.
>
> test_output_2.7.6.txt : Output of test.py under Python 2.7.6.
>
> test_output_2.7.12.txt : Output of test.py under Python 2.7.12.
>
> p17.188.htm -- test data: public information from the U.S. Internal
> Revenue Service.
>
> Equivalent hardware was used in both cases.
>
> The outputs show that 2.7.12's re.sub() takes 1.2 times as long as
> 2.7.6's. It's a significant difference, but...
>
> ...it was not the dramatic degradation I expected to find in this
> exercise. Therefore I attempted to tease what I was looking for out
> of the profile stats I already uploaded to this site, made from actual
> production runs. My attempts are all found in an hg repository that
> can be downloaded from
> sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password
> bysIe20H .
>
> I do not feel the latter work took me where I wanted to go, and I
> think the reason is that, at least for purposes of our application,
> Python 2.7.12 has been so extensively refactored since Python 2.7.6.
> So it's an apples-to-oranges comparison, apparently. Still, the
> performance difference for re.sub() is quite dramatic , and re.sub()
> is the only comparable function whose performance dramatically
> worsened: in our application, 2.7.12's re.sub() takes 3.04 times as
> long as 2.7.6's.
>
> The good news, of course, is that by and large the performance of the
> other *comparable* functions largely improved, often dramatically.
> But at least in our application, it doesn't come close to making up
> for the degradation in re.sub().
>
> My by-the-gut bottom line: somebody who really knows the re module
> should take a deep look at re.sub(). Why would re.sub(), unlike all
> others, take so much longer to run, while *every* other function in
> the re module get (often much) faster? It feels like there's a bug
> somewhere in re.sub().
>
> Steve Newcomb
> |
|
Date |
User |
Action |
Args |
2016-09-01 21:01:26 | steve.newcomb | set | recipients:
+ steve.newcomb, rhettinger, serhiy.storchaka |
2016-09-01 21:01:26 | steve.newcomb | link | issue27898 messages |
2016-09-01 21:01:25 | steve.newcomb | create | |
|