classification
Title: site.py imports relatively large `sysconfig` module.
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eric.araujo, gregory.p.smith, inada.naoki, lemburg, ned.deily, vstinner
Priority: normal Keywords:

Created on 2017-02-17 09:57 by inada.naoki, last changed 2017-07-28 12:35 by inada.naoki. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 136 merged inada.naoki, 2017-02-17 09:57
PR 2476 closed inada.naoki, 2017-06-28 15:43
PR 2477 merged vstinner, 2017-06-28 16:02
PR 2478 closed vstinner, 2017-06-28 16:15
PR 2483 merged inada.naoki, 2017-06-29 05:59
PR 2927 merged ned.deily, 2017-07-28 06:43
PR 2928 merged inada.naoki, 2017-07-28 11:27
Messages (22)
msg287981 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 09:57
site.py uses sysconfig (and sysconfigdata, _osx_support) module for user-site package.

But sysconfig module is not so lightweight, and very rarely used.
Actually speaking, only tests and distutils uses sysconfig in stdlibs.

And it takes about 7% of startup time, only for searching user-site path.

I tried to port minimal subset of sysconfig into site.py (GH-136).
But 'PYTHONFRAMEWORK' is only in sysconfigdata.  So I couldn't get rid sysconfig dependency completely.

How can I do to solve this?

a) Drop "osx_framework_user" (`~/Library/Python/3.7/`) support completely.
b) Add "sys._osx_framework" attribute
c) Create minimal sysconfigdata only for site.py
d) anything else?
msg287983 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-02-17 10:20
Instead of using slow sysconfig and loading the big _sysconfig_data dictionary in memory, would it be possible to extract the minimum set of sysconfig needed by the site module and put it in a builtin module? In site.py, I only found 4 variables:

    from sysconfig import get_config_var
    USER_BASE = get_config_var('userbase')

    from sysconfig import get_path
            USER_SITE = get_path('purelib', 'osx_framework_user')
    USER_SITE = get_path('purelib', '%s_user' % os.name)

            from sysconfig import get_config_var
            framework = get_config_var("PYTHONFRAMEWORK")

Because of the site module, the _sysconfig_data module dictionary is always loaded in memory even for for a dummy print("Hello World!").

I suggest to start building a _site builtin module: subset of site.py which would avoid sysconfig and reimplement things in C for best performances.

speed.python.org:
* python_startup: 14 ms
* python_startup_nosite: 8 ms

Importing site takes 6 ms: 42% of 14 ms...

I'm interested to know if it would be possible to reduce these 6 ms by rewriting some parts of site.py in C.
msg287984 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-02-17 10:23
Serhiy collected interesting numbers, copy/paste of this message:
http://bugs.python.org/issue28637#msg280380

On my computer:

Importing empty module: 160 us
Creating empty class: 30 us
Creating empty function: 0.16 us
Creating empty Enum/IntEnum: 125/150 us
Creating Enum/IntEnum member: 25/27 us
Creating empty namedtuple: 600 us
Creating namedtuple member: 50 us
Importing the itertools module: 40 us
Importing the io module: 900 us
Importing the os module: 1600 us
Importing the functools module: 2100 us
Importing the re module (with all sre submodules): 3300 us
Python startup time: 43000 us
msg287985 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-17 10:33
What's your platform, Inada? Are you running macOS? I optimized site.py for Linux and BSD users a couple of years ago.
msg287988 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 11:16
Christian: I'm using macOS on office and Linux on home.

sysconfig is imported even on Linux
https://github.com/python/cpython/blob/master/Lib/site.py#L247-L248
https://github.com/python/cpython/blob/master/Lib/site.py#L263-L271
msg287990 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-17 11:45
I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.

However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.
msg287997 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-02-17 12:06
Marc-Andre Lemburg added the comment:
> I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.
>
> However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

Right, I don't propose to rewrite the 598 lines of site.py in C, but
only rewrite the parts which have a huge impact on the startup time.
It seems like the minimum part would be to write a _site module which
provide the 4 variables currently read from sysconfig.

I'm proposing to add a new private module because I don't want to
pollute site which already contains too many things.

I looked at site.py history: I don't see *major* changes last 2 years.
Only small enhancements, updates and fixes.

> Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.

I don't think that it's worth it to reimplement partially sysconfig in
C. This module is huge, complex, and platform dependant.

Well, I'm not sure about what is the best approach, but I'm sure that
we can do something to optimize site.py. 6 ms is a lot!

I never liked site.py. It seems like a huge workaround. I also dislike
having a different behaviour if site is imported or not.

That's why I asked Steve Dower to removing the code to create the
cpXXX alias for the mbcs codec from site.py to encodings/__init__.py:
see commit f5aba58480bb0dd45181f609487ac2ecfcc98673. I'm happy that
this code was removed from site.py!
msg287999 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-17 12:14
Instead of _site, would it make sense to include the four vars in sys, perhaps as named structure like sys.flags?
msg288000 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 12:28
FYI, here is profile of site:
https://gist.github.com/methane/1f1fe4385dad84f03eb429359f0f917b
msg288001 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 12:46
no-site
python_startup_no_site: Median +- std dev: 9.13 ms +- 0.02 ms

default:
python_startup: Median +- std dev: 15.6 ms +- 0.0 ms

GH-136 + skip abs_paths().
python_startup: Median +- std dev: 14.2 ms +- 0.0 ms

profile of GH-136 + skip abs_paths():
https://gist.github.com/methane/26fc0a2382207655a6819a92f867620c

Most of time is consumed by importlib.
msg288012 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-17 15:30
On 17.02.2017 13:06, STINNER Victor wrote:
>> Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.
> 
> I don't think that it's worth it to reimplement partially sysconfig in
> C. This module is huge, complex, and platform dependant.

Sorry, I was just referring to the data part of sysconfig,
not sysconfig itself.

Having a lookup function much like we have for unicodedata
makes things much more manageable, since you don't need to
generate a dictionary in memory for all the values in the
config data. Creating that dictionary takes a while (in terms
of ms).
msg288020 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 16:42
I create #29592 for abs_paths().  Let's focus on sysconfig in this issue.

PR 136 ports really needed part of sysconfig into site.py already.
'PYTHONFRAMEWORK' on macOS is the only variable we need import from sysconfig.

Adding `site.cfg` like `pyvenv.cfg` make sense?
msg288057 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-18 05:06
PR 136 now adds `sys._framework` and 'PYTHONFRAMEWORK' macro in pyconfig.h.
msg297192 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-06-28 15:31
New changeset a8f8d5b4bd30dbe0828550469d98f12d2ebb2ef4 by INADA Naoki in branch 'master':
bpo-29585: optimize site.py startup time (GH-136)
https://github.com/python/cpython/commit/a8f8d5b4bd30dbe0828550469d98f12d2ebb2ef4
msg297194 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-28 16:10
Test fails on macOS:

http://buildbot.python.org/all/builders/x86-64%20Sierra%203.x/builds/402/steps/test/logs/stdio

======================================================================
FAIL: test_getsitepackages (test.test_site.HelperFunctionsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/buildbot/buildarea/3.x.billenstein-sierra/build/Lib/test/test_site.py", line 266, in test_getsitepackages
    self.assertEqual(len(dirs), 2)
AssertionError: 1 != 2
msg297197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-28 16:34
New changeset b01c574ad6d796025152b5d605eceb7816e6f7a7 by Victor Stinner in branch 'master':
bpo-29585: Define PYTHONFRAMEWORK in PC/pyconfig.h (#2477)
https://github.com/python/cpython/commit/b01c574ad6d796025152b5d605eceb7816e6f7a7
msg297258 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-06-29 06:31
New changeset 6b42eb17649bed9615b6e6cecaefdb2f46990b2c by INADA Naoki in branch 'master':
bpo-29585: Fix sysconfig.get_config_var("PYTHONFRAMEWORK") (GH-2483)
https://github.com/python/cpython/commit/6b42eb17649bed9615b6e6cecaefdb2f46990b2c
msg299368 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-07-28 06:30
test_get_path fails on macOS installed framework builds:

======================================================================
FAIL: test_get_path (test.test_site.HelperFunctionsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nad/Projects/PyDev/active/dev/3x/root/fwd_macports/Library/Frameworks/pytest_10.12.framework/Versions/3.7/lib/python3.7/test/test_site.py", line 188, in test_get_path
    sysconfig.get_path('purelib', os.name + '_user'))
AssertionError: '/Users/nad/Library/pytest_10.12/3.7/lib/python/site-packages' != '/Users/nad/Library/pytest_10.12/3.7/lib/python3.7/site-packages'
- /Users/nad/Library/pytest_10.12/3.7/lib/python/site-packages
+ /Users/nad/Library/pytest_10.12/3.7/lib/python3.7/site-packages
?                                               +++


----------------------------------------------------------------------
Ran 27 tests in 0.471s

FAILED (failures=1, skipped=4)
msg299371 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-07-28 07:02
New changeset c22bd58d933efaec26d1f77f263b2845473b7e15 by Ned Deily in branch 'master':
bpo-28095: Re-enable temporarily disabled part of test_startup_imports on macOS (#2927)
https://github.com/python/cpython/commit/c22bd58d933efaec26d1f77f263b2845473b7e15
msg299379 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-28 11:16
https://docs.python.org/3.6/library/site.html#site.USER_SITE

> ~/Library/Python/X.Y/lib/python/site-packages for Mac framework builds

So it seems I broke sysconfig.get_path('purelib', 'posix_user').
msg299380 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-28 11:20
https://github.com/python/cpython/pull/136/files

 -    if sys.platform == 'darwin':
 -        from sysconfig import get_config_var
 -        if get_config_var('PYTHONFRAMEWORK'):
 -            USER_SITE = get_path('purelib', 'osx_framework_user')
 -            return USER_SITE
 +    if USER_SITE is None:
 +        USER_SITE = _get_path(userbase)

OK, I need to use `osx_framework_user` instead of os.name + '_user' on framework build.
msg299383 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-07-28 12:28
New changeset ba9ddb7eea39a651ba7f1ab3eb012e4129c03620 by INADA Naoki in branch 'master':
bpo-29585: fix test fail on macOS Framework build (GH-2928)
https://github.com/python/cpython/commit/ba9ddb7eea39a651ba7f1ab3eb012e4129c03620
History
Date User Action Args
2017-07-28 12:35:27inada.naokisetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2017-07-28 12:28:22inada.naokisetmessages: + msg299383
2017-07-28 11:27:37inada.naokisetpull_requests: + pull_request2980
2017-07-28 11:20:22inada.naokisetmessages: + msg299380
2017-07-28 11:16:21inada.naokisetmessages: + msg299379
2017-07-28 07:02:13ned.deilysetmessages: + msg299371
2017-07-28 06:43:24ned.deilysetpull_requests: + pull_request2979
2017-07-28 06:30:57ned.deilysetstatus: closed -> open

nosy: + ned.deily
messages: + msg299368

resolution: fixed -> (no value)
stage: resolved -> needs patch
2017-06-30 22:03:22ned.deilylinkissue30795 superseder
2017-06-29 06:32:11inada.naokisetstatus: open -> closed
resolution: fixed
stage: resolved
2017-06-29 06:31:40inada.naokisetmessages: + msg297258
2017-06-29 05:59:46inada.naokisetpull_requests: + pull_request2542
2017-06-28 16:34:44vstinnersetmessages: + msg297197
2017-06-28 16:15:04vstinnersetpull_requests: + pull_request2532
2017-06-28 16:10:22vstinnersetmessages: + msg297194
2017-06-28 16:02:02vstinnersetpull_requests: + pull_request2531
2017-06-28 15:43:05inada.naokisetpull_requests: + pull_request2530
2017-06-28 15:31:56inada.naokisetmessages: + msg297192
2017-02-20 23:28:55gregory.p.smithsetnosy: + gregory.p.smith
2017-02-18 05:06:25inada.naokisetmessages: + msg288057
2017-02-17 18:53:40eric.araujosetnosy: + eric.araujo
2017-02-17 16:42:40inada.naokisetmessages: + msg288020
2017-02-17 15:30:39lemburgsetmessages: + msg288012
2017-02-17 12:46:11inada.naokisetmessages: + msg288001
2017-02-17 12:28:46inada.naokisetmessages: + msg288000
2017-02-17 12:14:05christian.heimessetmessages: + msg287999
2017-02-17 12:06:23vstinnersetmessages: + msg287997
2017-02-17 11:45:59lemburgsetnosy: + lemburg
messages: + msg287990
2017-02-17 11:16:31inada.naokisetmessages: + msg287988
2017-02-17 10:33:35christian.heimessetmessages: + msg287985
2017-02-17 10:32:02christian.heimessetnosy: + christian.heimes
2017-02-17 10:23:08vstinnersetmessages: + msg287984
2017-02-17 10:20:36vstinnersetnosy: + vstinner
messages: + msg287983
2017-02-17 09:57:03inada.naokicreate