classification
Title: abs_paths() in site.py is slow
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, fdrake, gregory.p.smith, inada.naoki, ncoghlan
Priority: normal Keywords:

Created on 2017-02-17 16:19 by inada.naoki, last changed 2017-03-24 22:19 by inada.naoki. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 167 merged inada.naoki, 2017-02-19 06:29
Messages (9)
msg288019 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 16:19
abs_paths() is the another most slow parts of site module.
We can speedup it if we have C implementation of os.path.abspath() or abs_paths().

But before trying it, is abs_paths() really needed for now?
It only rewrite `__file__` and `__cached__` of modules imported before site is imported.
What is difference between modules loaded befere / after site module?

Here is profile output. sysconfig dependency is removed by #29585 patch, and pstat is modified
to show time in ms instead of sec.

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3/1    0.004    0.001    2.525    2.525 {built-in method builtins.exec}
        1    0.003    0.003    2.524    2.524 x.py:1(<module>)
        1    0.010    0.010    1.806    1.806 site.py:555(main)
      4/3    0.022    0.005    1.179    0.393 <frozen importlib._bootstrap>:958(_find_and_load)
      4/3    0.017    0.004    1.110    0.370 <frozen importlib._bootstrap>:931(_find_and_load_unlocked)
        1    0.195    0.195    0.928    0.928 site.py:99(abs_paths)
      108    0.098    0.001    0.776    0.007 posixpath.py:367(abspath)
        4    0.035    0.009    0.647    0.162 <frozen importlib._bootstrap>:861(_find_spec)
        4    0.005    0.001    0.589    0.147 <frozen importlib._bootstrap_external>:1150(find_spec)
        4    0.043    0.011    0.584    0.146 <frozen importlib._bootstrap_external>:1118(_get_spec)
      2/1    0.012    0.006    0.557    0.557 <frozen importlib._bootstrap>:641(_load_unlocked)
      2/1    0.006    0.003    0.511    0.511 <frozen importlib._bootstrap_external>:673(exec_module)
      108    0.461    0.004    0.493    0.005 posixpath.py:329(normpath)
       16    0.150    0.009    0.453    0.028 <frozen importlib._bootstrap_external>:1234(find_spec)
msg288049 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-18 01:29
path normalization was added by
https://github.com/python/cpython/commit/38cb9f1f174415d3b37fbaeb5d152d65525839d2
msg288052 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-18 02:12
https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py#L1089-L1095

While '' in sys.path, it is converted to `getcwd()` before calling PathHook.
Is there any chance relative / not normalized path is in sys.path before site is loaded?
msg288053 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-18 02:18
$ python2 -S
Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
[GCC 6.2.0 20160914] on linux2
>>> import x
>>> x.__file__
'x.py'

$ python3 -S
Python 3.6.0 (default, Dec 30 2016, 20:49:54) 
[GCC 6.2.0 20161005] on linux
>>> import x
>>> x.__file__
'/home/inada-n/x.py'

I think we can remove `abs_paths()` in site.py, thanks to _frozen_importlib_external.

I added all import experts to nosy list.
Please give me advice.
msg288113 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 06:39
Aye, I agree it should be redundant now - import system should always be making __file__ and __cached__ absolutely at import time these days.

So +1 for dropping this from 3.7 and getting a bit of startup time back.
msg288128 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 11:22
CI failure indicating it isn't redundant, but could still potentially be made faster since non-absolute paths should be relatively rare now:

======================================================================

FAIL: test_s_option (test.test_site.HelperFunctionsTests)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/python/cpython/Lib/test/test_site.py", line 173, in test_s_option

    self.assertIn(usersite, sys.path)

AssertionError: '/home/travis/.local/lib/python3.7/site-packages' not found in ['', '/usr/local/lib/python37.zip', '/home/travis/build/python/cpython/Lib', '/home/travis/build/python/cpython/build/lib.linux-x86_64-3.7-pydebug']

======================================================================

FAIL: test_abs_paths (test.test_site.ImportSideEffectTests)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/python/cpython/Lib/test/test_site.py", line 365, in test_abs_paths

    .format(os__file__.decode('ascii')))

AssertionError: False is not true : expected absolute path, got ../../Lib/os.py

----------------------------------------------------------------------
msg288130 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 11:27
Note that "os" doesn't get imported normally, it gets injected as part of the importlib bootstrapping process (since _bootstrap_external.py needs the os module in order to work).
msg288137 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-19 14:23
I got it.
removeduppaths() may change relpath in sys.path to absolute path.
abs_paths() changes __file__ and __cached__ for consistency with the changed sys.path.

I updated PR 167 to call abs_paths() only if removeduppaths() modified sys.path.
Strictly speaking, abs_paths() is required only when removeduppaths() converted relpath to absolute path.

But because duplicated paths are rare too, I think this approach is practical enough.
msg290185 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-03-24 22:19
New changeset 2e4e011795d26cab1a3843383d0539c12fea2458 by INADA Naoki in branch 'master':
bpo-29592: site: skip abs_paths() when it's redundant (GH-167)
https://github.com/python/cpython/commit/2e4e011795d26cab1a3843383d0539c12fea2458
History
Date User Action Args
2017-03-24 22:19:52inada.naokisetmessages: + msg290185
2017-03-14 15:52:43inada.naokisetstatus: open -> closed
resolution: fixed
stage: resolved
2017-02-20 23:28:52gregory.p.smithsetnosy: + gregory.p.smith
2017-02-19 14:23:49inada.naokisetmessages: + msg288137
2017-02-19 11:27:53ncoghlansetmessages: + msg288130
2017-02-19 11:22:53ncoghlansetmessages: + msg288128
2017-02-19 06:39:19ncoghlansetmessages: + msg288113
2017-02-19 06:29:14inada.naokisetpull_requests: + pull_request131
2017-02-18 02:18:25inada.naokisetnosy: + fdrake, brett.cannon, ncoghlan, eric.snow
messages: + msg288053
2017-02-18 02:12:54inada.naokisetmessages: + msg288052
2017-02-18 01:29:42inada.naokisetmessages: + msg288049
2017-02-17 16:19:42inada.naokicreate