Title: Performance regression in urllib.proxy_bypass_environment
Messages (4)
msg308924 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-12-22 06:14
Recently we update our environment from Python2.7.5 to Python2.7.13. Then one process's CPU usage grow from 15% to 70%. The cause is urllib.proxy_bypass_environment, the commit I wrote in #26864. Our environments get a no_proxy environment variable which contains 4000+ items. See the performance difference:

cascading-controller:~ # time python2 -c 'import urllib; urllib.proxy_bypass_environment("")'

real	0m1.134s
user	0m1.126s
sys	0m0.007s
cascading-controller:~ # time python2 -c 'import urllib; urllib.proxy_bypass_environment("")'

real	0m0.037s
user	0m0.024s
sys	0m0.013s

Temporarily I increased regex cache size to 6000 and the CPU usage and time return to a reasonable range.
msg309025 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-12-25 02:51
Okay, the real performance is:

time python2 -c 'import urllib; urllib.proxy_bypass_environment("")'

real	0m0.661s
user	0m0.654s
sys	0m0.007s

I compile it with a wrong option with the specific GCC version. But still really slow comparing to before.
msg389957 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2021-04-01 06:54
I think this issue has already been solved by #39507. The time difference is:


time python3 -c 'import urllib.request; urllib.request.proxy_bypass_environment("")'

real    0m0.912s
user    0m0.902s
sys     0m0.010s


time python3 -c 'import urllib.request; urllib.request.proxy_bypass_environment("")'

real    0m0.105s
user    0m0.086s
sys     0m0.019s
msg389958 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2021-04-01 06:56
Sorry, it's #39057
