classification
Title: Most of Python's startup time is sysconfig
Type: performance Stage: resolved
Components: Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, barry, doko, eric.araujo, ezio.melotti, haypo, jcon, nadeem.vawda, ncoghlan, pitrou, python-dev, rosslagerwall, rpetrov, tarek, terry.reedy
Priority: normal Keywords: patch

Created on 2011-10-11 03:02 by pitrou, last changed 2013-04-08 19:20 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
sysconfigdata.patch pitrou, 2011-10-11 14:53 review
Messages (24)
msg145328 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 03:02
sysconfig is imported and used by site.py.

$ time ./python -S -c ''

real	0m0.019s
user	0m0.013s
sys	0m0.005s

$ time ./python -S -c 'import sysconfig'

real	0m0.047s
user	0m0.046s
sys	0m0.002s

$ time ./python -S -c 'import sysconfig; sysconfig.get_path("purelib")'

real	0m0.053s
user	0m0.047s
sys	0m0.005s

$ time ./python -c ''

real	0m0.058s
user	0m0.054s
sys	0m0.003s
msg145342 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 13:37
Actually, a big part of that is compiling some regexes in the tokenize module. Just relying on the re module's internal caching shaves off 20% of total startup time.

Before:

$ time ./python -S -c 'import tokenize'

real	0m0.034s
user	0m0.030s
sys	0m0.003s
$ time ./python -c ''

real	0m0.055s
user	0m0.050s
sys	0m0.005s

After:

$ time ./python -S -c 'import tokenize'

real	0m0.021s
user	0m0.019s
sys	0m0.001s
$ time ./python -c ''

real	0m0.044s
user	0m0.038s
sys	0m0.006s
msg145344 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-11 13:49
New changeset df950158dc33 by Antoine Pitrou in branch 'default':
Issue #13150: The tokenize module doesn't compile large regular expressions at startup anymore.
http://hg.python.org/cpython/rev/df950158dc33
msg145346 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2011-10-11 14:07
I am curious: wouldn't be a way of keeping the compiled expressions in a static cache somewhere, so we would compile them just once and have both import time and runtime fast ?
msg145347 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-11 14:11
New changeset ed0bc92fed68 by Antoine Pitrou in branch 'default':
Use a dict for faster sysconfig startup (issue #13150)
http://hg.python.org/cpython/rev/ed0bc92fed68
msg145348 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 14:13
> I am curious: wouldn't be a way of keeping the compiled expressions in
> a static cache somewhere, so we would compile them just once and have
> both import time and runtime fast ?

Runtime shouldn't be affected. The re module has its own LRU caching.

That said, it seems regular expressions are pickleable:

b'\x80\x03cre\n_compile\nq\x00X\x00\x00\x00\x00q\x01K \x86q\x02Rq\x03.'
msg145349 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 14:13
Arg damn roundup e-mail gateway.
I wanted to paste:

>>> pickle.dumps(re.compile(''))
b'\x80\x03cre\n_compile\nq\x00X\x00\x00\x00\x00q\x01K \x86q\x02Rq\x03.'
msg145350 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 14:53
Pre-parsing and building a cached module of built-time variables (from Makefile and pyconfig.h) under POSIX also removes more than 15% of startup time. Patch attached.
msg145358 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-10-11 18:44
#11454 is another case where pre-parsing and pickling the regular expressions in the email module may improve import time considerably.
msg145397 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-12 15:47
#9878 should also help with start-up time.
msg145398 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-12 15:50
Actually, #9878 should supersede this bug: it proposes to generate a C module to avoid parsing Makefile and pyconfig.h, and your patch proposes to generate a Python module with the same goal.
msg145402 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-12 16:01
> Actually, #9878 should supersede this bug: it proposes to generate a C 
> module to avoid parsing Makefile and pyconfig.h, and your patch
> proposes to generate a Python module with the same goal.

Well, #9878 doesn't have a patch, but perhaps Barry is willing to work on one. Also, if we have a pure Python solution, perhaps a C module isn't needed. The main advantage of the C solution, though, would be to avoid dubious parsing altogher, even at build time.
msg145423 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-12 19:52
Since #9878 proposes an *alternate* solution to *part* of the sysconfig problem, I disagree with 'supersede'. A Python solution would be more useful for other implementations if enough of the sysconfig info is not CPython specific.

A CPython design feature is that it parses and compiles Python code just once per run, and imported modules just once until the code changes (or might have). For functions, everything possible is put into a behind-the-scenes code object. So even inner functions are parsed and compiled just once.

The problem with sysconfig, it appears, is that lacks the equivalent design feature but instead does the equivalent of re-parsing and re-compiling inner functions with each outer function call.
msg145458 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-13 15:40
> Since #9878 proposes an *alternate* solution to *part* of the
> sysconfig problem, I disagree with 'supersede'.
It’s also an older issue.

> A Python solution would be more useful for other implementations
> if enough of the sysconfig info is not CPython specific.
That’s the point: the info currently parsed at runtime by sysconfig is specific to CPython (Makefile and pyconfig.h), so adding a CPython-specific C module was thought the way to go.
msg145462 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-13 15:50
> > A Python solution would be more useful for other implementations
> > if enough of the sysconfig info is not CPython specific.
> That’s the point: the info currently parsed at runtime by sysconfig is
> specific to CPython (Makefile and pyconfig.h), so adding a
> CPython-specific C module was thought the way to go.

A module doesn't have to be written in C to be CPython-specific.
msg145822 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-18 15:59
New changeset 70160b53117f by Antoine Pitrou in branch 'default':
Issue #13150: sysconfig no longer parses the Makefile and config.h files
http://hg.python.org/cpython/rev/70160b53117f
msg145823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-18 16:00
Done! If someone wants to give life to the C approach, they are welcome :)
msg145838 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-18 16:35
BTW, distutils2 backports the sysconfig module and cfg file from 3.3, so now the two versions will diverge.
msg145870 - (view) Author: Roumen Petrov (rpetrov) * Date: 2011-10-18 21:39
10x for solution, 10x  for commit . 
Good bye cross compilation!
Any attempt  to improve python build system to support cross-build,  multilib build, build outside source tree with different options is useless.
msg145983 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-19 22:40
New changeset 677e625e2ef1 by Victor Stinner in branch 'default':
Issue #13150: Add a comment in _sysconfigdata to explain the origin of this file
http://hg.python.org/cpython/rev/677e625e2ef1
msg165323 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2012-07-12 16:58
the current ability to cross-build python now relies on being able to run the build python with the host library, using the _sysconfigdata.py from the host.

if somebody decides to implement _sysconfigdata as a C extension, please ensure that this information still can be passed to the build python.
msg184916 - (view) Author: Roundup Robot (python-dev) Date: 2013-03-21 22:02
New changeset 66e30c4870bb by doko in branch '2.7':
- Issue #13150: sysconfig no longer parses the Makefile and config.h files
http://hg.python.org/cpython/rev/66e30c4870bb
msg184967 - (view) Author: Roundup Robot (python-dev) Date: 2013-03-22 14:37
New changeset d174cb3f5b9e by Benjamin Peterson in branch '2.7':
backout 66e30c4870bb for breaking OSX (#13150)
http://hg.python.org/cpython/rev/d174cb3f5b9e
msg186332 - (view) Author: Roundup Robot (python-dev) Date: 2013-04-08 19:20
New changeset be3b4aa2ad28 by doko in branch '2.7':
- Issue #13150, #17512: sysconfig no longer parses the Makefile and config.h
http://hg.python.org/cpython/rev/be3b4aa2ad28
History
Date User Action Args
2013-04-08 19:20:25python-devsetmessages: + msg186332
2013-03-22 14:37:59python-devsetmessages: + msg184967
2013-03-21 22:02:22python-devsetmessages: + msg184916
2012-07-12 16:58:07dokosetnosy: + doko
messages: + msg165323
2011-10-19 22:40:57python-devsetmessages: + msg145983
2011-10-18 21:39:19rpetrovsetnosy: + rpetrov
messages: + msg145870
2011-10-18 17:32:50Arfreversetnosy: + Arfrever
2011-10-18 16:35:46eric.araujosetmessages: + msg145838
2011-10-18 16:00:25pitrousetstatus: open -> closed
resolution: fixed
messages: + msg145823

stage: resolved
2011-10-18 15:59:33python-devsetmessages: + msg145822
2011-10-13 15:50:28pitrousetmessages: + msg145462
2011-10-13 15:40:52eric.araujosetmessages: + msg145458
2011-10-13 00:07:52jconsetnosy: + jcon
2011-10-12 19:52:06terry.reedysetmessages: + msg145423
2011-10-12 16:01:11pitrousetnosy: + barry
messages: + msg145402
2011-10-12 15:50:01eric.araujosetmessages: + msg145398
2011-10-12 15:48:38hayposetnosy: + haypo
2011-10-12 15:47:32eric.araujosetmessages: + msg145397
2011-10-11 18:44:56rosslagerwallsetnosy: + rosslagerwall
messages: + msg145358
2011-10-11 14:53:08pitrousetfiles: + sysconfigdata.patch
keywords: + patch
messages: + msg145350
2011-10-11 14:13:56pitrousetmessages: + msg145349
2011-10-11 14:13:31pitrousetmessages: + msg145348
2011-10-11 14:11:16python-devsetmessages: + msg145347
2011-10-11 14:07:34tareksetmessages: + msg145346
2011-10-11 13:49:42python-devsetnosy: + python-dev
messages: + msg145344
2011-10-11 13:37:04pitrousetmessages: + msg145342
2011-10-11 09:32:26nadeem.vawdasetnosy: + nadeem.vawda
2011-10-11 03:04:20ezio.melottisetnosy: + ezio.melotti
2011-10-11 03:02:57pitroucreate