classification
Title: site.py imports relatively large `sysconfig` module.
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, gregory.p.smith, haypo, inada.naoki, lemburg, merwok
Priority: normal Keywords:

Created on 2017-02-17 09:57 by inada.naoki, last changed 2017-02-20 23:28 by gregory.p.smith.

Pull Requests
URL Status Linked Edit
PR 136 open inada.naoki, 2017-02-17 09:57
Messages (13)
msg287981 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 09:57
site.py uses sysconfig (and sysconfigdata, _osx_support) module for user-site package.

But sysconfig module is not so lightweight, and very rarely used.
Actually speaking, only tests and distutils uses sysconfig in stdlibs.

And it takes about 7% of startup time, only for searching user-site path.

I tried to port minimal subset of sysconfig into site.py (GH-136).
But 'PYTHONFRAMEWORK' is only in sysconfigdata.  So I couldn't get rid sysconfig dependency completely.

How can I do to solve this?

a) Drop "osx_framework_user" (`~/Library/Python/3.7/`) support completely.
b) Add "sys._osx_framework" attribute
c) Create minimal sysconfigdata only for site.py
d) anything else?
msg287983 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-17 10:20
Instead of using slow sysconfig and loading the big _sysconfig_data dictionary in memory, would it be possible to extract the minimum set of sysconfig needed by the site module and put it in a builtin module? In site.py, I only found 4 variables:

    from sysconfig import get_config_var
    USER_BASE = get_config_var('userbase')

    from sysconfig import get_path
            USER_SITE = get_path('purelib', 'osx_framework_user')
    USER_SITE = get_path('purelib', '%s_user' % os.name)

            from sysconfig import get_config_var
            framework = get_config_var("PYTHONFRAMEWORK")

Because of the site module, the _sysconfig_data module dictionary is always loaded in memory even for for a dummy print("Hello World!").

I suggest to start building a _site builtin module: subset of site.py which would avoid sysconfig and reimplement things in C for best performances.

speed.python.org:
* python_startup: 14 ms
* python_startup_nosite: 8 ms

Importing site takes 6 ms: 42% of 14 ms...

I'm interested to know if it would be possible to reduce these 6 ms by rewriting some parts of site.py in C.
msg287984 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-17 10:23
Serhiy collected interesting numbers, copy/paste of this message:
http://bugs.python.org/issue28637#msg280380

On my computer:

Importing empty module: 160 us
Creating empty class: 30 us
Creating empty function: 0.16 us
Creating empty Enum/IntEnum: 125/150 us
Creating Enum/IntEnum member: 25/27 us
Creating empty namedtuple: 600 us
Creating namedtuple member: 50 us
Importing the itertools module: 40 us
Importing the io module: 900 us
Importing the os module: 1600 us
Importing the functools module: 2100 us
Importing the re module (with all sre submodules): 3300 us
Python startup time: 43000 us
msg287985 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-17 10:33
What's your platform, Inada? Are you running macOS? I optimized site.py for Linux and BSD users a couple of years ago.
msg287988 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 11:16
Christian: I'm using macOS on office and Linux on home.

sysconfig is imported even on Linux
https://github.com/python/cpython/blob/master/Lib/site.py#L247-L248
https://github.com/python/cpython/blob/master/Lib/site.py#L263-L271
msg287990 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-17 11:45
I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.

However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.
msg287997 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-17 12:06
Marc-Andre Lemburg added the comment:
> I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.
>
> However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

Right, I don't propose to rewrite the 598 lines of site.py in C, but
only rewrite the parts which have a huge impact on the startup time.
It seems like the minimum part would be to write a _site module which
provide the 4 variables currently read from sysconfig.

I'm proposing to add a new private module because I don't want to
pollute site which already contains too many things.

I looked at site.py history: I don't see *major* changes last 2 years.
Only small enhancements, updates and fixes.

> Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.

I don't think that it's worth it to reimplement partially sysconfig in
C. This module is huge, complex, and platform dependant.

Well, I'm not sure about what is the best approach, but I'm sure that
we can do something to optimize site.py. 6 ms is a lot!

I never liked site.py. It seems like a huge workaround. I also dislike
having a different behaviour if site is imported or not.

That's why I asked Steve Dower to removing the code to create the
cpXXX alias for the mbcs codec from site.py to encodings/__init__.py:
see commit f5aba58480bb0dd45181f609487ac2ecfcc98673. I'm happy that
this code was removed from site.py!
msg287999 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-02-17 12:14
Instead of _site, would it make sense to include the four vars in sys, perhaps as named structure like sys.flags?
msg288000 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 12:28
FYI, here is profile of site:
https://gist.github.com/methane/1f1fe4385dad84f03eb429359f0f917b
msg288001 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 12:46
no-site
python_startup_no_site: Median +- std dev: 9.13 ms +- 0.02 ms

default:
python_startup: Median +- std dev: 15.6 ms +- 0.0 ms

GH-136 + skip abs_paths().
python_startup: Median +- std dev: 14.2 ms +- 0.0 ms

profile of GH-136 + skip abs_paths():
https://gist.github.com/methane/26fc0a2382207655a6819a92f867620c

Most of time is consumed by importlib.
msg288012 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-02-17 15:30
On 17.02.2017 13:06, STINNER Victor wrote:
>> Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.
> 
> I don't think that it's worth it to reimplement partially sysconfig in
> C. This module is huge, complex, and platform dependant.

Sorry, I was just referring to the data part of sysconfig,
not sysconfig itself.

Having a lookup function much like we have for unicodedata
makes things much more manageable, since you don't need to
generate a dictionary in memory for all the values in the
config data. Creating that dictionary takes a while (in terms
of ms).
msg288020 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-17 16:42
I create #29592 for abs_paths().  Let's focus on sysconfig in this issue.

PR 136 ports really needed part of sysconfig into site.py already.
'PYTHONFRAMEWORK' on macOS is the only variable we need import from sysconfig.

Adding `site.cfg` like `pyvenv.cfg` make sense?
msg288057 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-18 05:06
PR 136 now adds `sys._framework` and 'PYTHONFRAMEWORK' macro in pyconfig.h.
History
Date User Action Args
2017-02-20 23:28:55gregory.p.smithsetnosy: + gregory.p.smith
2017-02-18 05:06:25inada.naokisetmessages: + msg288057
2017-02-17 18:53:40merwoksetnosy: + merwok
2017-02-17 16:42:40inada.naokisetmessages: + msg288020
2017-02-17 15:30:39lemburgsetmessages: + msg288012
2017-02-17 12:46:11inada.naokisetmessages: + msg288001
2017-02-17 12:28:46inada.naokisetmessages: + msg288000
2017-02-17 12:14:05christian.heimessetmessages: + msg287999
2017-02-17 12:06:23hayposetmessages: + msg287997
2017-02-17 11:45:59lemburgsetnosy: + lemburg
messages: + msg287990
2017-02-17 11:16:31inada.naokisetmessages: + msg287988
2017-02-17 10:33:35christian.heimessetmessages: + msg287985
2017-02-17 10:32:02christian.heimessetnosy: + christian.heimes
2017-02-17 10:23:08hayposetmessages: + msg287984
2017-02-17 10:20:36hayposetnosy: + haypo
messages: + msg287983
2017-02-17 09:57:03inada.naokicreate