Message274182
On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
> Raymond Hettinger added the comment:
>
> It would be helpful if you ... make a small set of regular expressions that demonstrate the performance regression.
>
Done. Attachments:
test.py : Code that exercises re.sub() and outputs a profile report.
test_output_2.7.6.txt : Output of test.py under Python 2.7.6.
test_output_2.7.12.txt : Output of test.py under Python 2.7.12.
p17.188.htm -- test data: public information from the U.S. Internal
Revenue Service.
Equivalent hardware was used in both cases.
The outputs show that 2.7.12's re.sub() takes 1.2 times as long as
2.7.6's. It's a significant difference, but...
...it was not the dramatic degradation I expected to find in this
exercise. Therefore I attempted to tease what I was looking for out of
the profile stats I already uploaded to this site, made from actual
production runs. My attempts are all found in an hg repository that can
be downloaded from
sftp://sftp@coolheads.com//files/py-re-perform-276-2712 using password
bysIe20H .
I do not feel the latter work took me where I wanted to go, and I think
the reason is that, at least for purposes of our application, Python
2.7.12 has been so extensively refactored since Python 2.7.6. So it's
an apples-to-oranges comparison, apparently. Still, the performance
difference for re.sub() is quite dramatic , and re.sub() is the only
comparable function whose performance dramatically worsened: in our
application, 2.7.12's re.sub() takes 3.04 times as long as 2.7.6's.
The good news, of course, is that by and large the performance of the
other *comparable* functions largely improved, often dramatically. But
at least in our application, it doesn't come close to making up for the
degradation in re.sub().
My by-the-gut bottom line: somebody who really knows the re module
should take a deep look at re.sub(). Why would re.sub(), unlike all
others, take so much longer to run, while *every* other function in the
re module get (often much) faster? It feels like there's a bug
somewhere in re.sub().
Steve Newcomb |
|
Date |
User |
Action |
Args |
2016-09-01 20:53:04 | steve.newcomb | set | recipients:
+ steve.newcomb, rhettinger, serhiy.storchaka |
2016-09-01 20:53:00 | steve.newcomb | link | issue27898 messages |
2016-09-01 20:53:00 | steve.newcomb | create | |
|