Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most of Python's startup time is sysconfig #57359

Closed
pitrou opened this issue Oct 11, 2011 · 24 comments
Closed

Most of Python's startup time is sysconfig #57359

pitrou opened this issue Oct 11, 2011 · 24 comments
Labels
performance Performance or resource usage

Comments

@pitrou
Copy link
Member

pitrou commented Oct 11, 2011

BPO 13150
Nosy @warsaw, @terryjreedy, @doko42, @ncoghlan, @pitrou, @vstinner, @tarekziade, @ezio-melotti, @merwok
Files
  • sysconfigdata.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-10-18.16:00:25.063>
    created_at = <Date 2011-10-11.03:02:57.474>
    labels = ['performance']
    title = "Most of Python's startup time is sysconfig"
    updated_at = <Date 2013-04-08.19:20:25.763>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2013-04-08.19:20:25.763>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-10-18.16:00:25.063>
    closer = 'pitrou'
    components = []
    creation = <Date 2011-10-11.03:02:57.474>
    creator = 'pitrou'
    dependencies = []
    files = ['23377']
    hgrepos = []
    issue_num = 13150
    keywords = ['patch']
    message_count = 24.0
    messages = ['145328', '145342', '145344', '145346', '145347', '145348', '145349', '145350', '145358', '145397', '145398', '145402', '145423', '145458', '145462', '145822', '145823', '145838', '145870', '145983', '165323', '184916', '184967', '186332']
    nosy_count = 15.0
    nosy_names = ['barry', 'terry.reedy', 'doko', 'ncoghlan', 'pitrou', 'vstinner', 'nadeem.vawda', 'tarek', 'ezio.melotti', 'eric.araujo', 'rpetrov', 'Arfrever', 'rosslagerwall', 'python-dev', 'jcon']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13150'
    versions = ['Python 3.3']

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 11, 2011

    sysconfig is imported and used by site.py.

    $ time ./python -S -c ''

    real 0m0.019s
    user 0m0.013s
    sys 0m0.005s

    $ time ./python -S -c 'import sysconfig'

    real 0m0.047s
    user 0m0.046s
    sys 0m0.002s

    $ time ./python -S -c 'import sysconfig; sysconfig.get_path("purelib")'

    real 0m0.053s
    user 0m0.047s
    sys 0m0.005s

    $ time ./python -c ''

    real 0m0.058s
    user 0m0.054s
    sys 0m0.003s

    @pitrou pitrou added the performance Performance or resource usage label Oct 11, 2011
    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 11, 2011

    Actually, a big part of that is compiling some regexes in the tokenize module. Just relying on the re module's internal caching shaves off 20% of total startup time.

    Before:

    $ time ./python -S -c 'import tokenize'

    real 0m0.034s
    user 0m0.030s
    sys 0m0.003s
    $ time ./python -c ''

    real 0m0.055s
    user 0m0.050s
    sys 0m0.005s

    After:

    $ time ./python -S -c 'import tokenize'

    real 0m0.021s
    user 0m0.019s
    sys 0m0.001s
    $ time ./python -c ''

    real 0m0.044s
    user 0m0.038s
    sys 0m0.006s

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 11, 2011

    New changeset df950158dc33 by Antoine Pitrou in branch 'default':
    Issue bpo-13150: The tokenize module doesn't compile large regular expressions at startup anymore.
    http://hg.python.org/cpython/rev/df950158dc33

    @tarekziade
    Copy link
    Mannequin

    tarekziade mannequin commented Oct 11, 2011

    I am curious: wouldn't be a way of keeping the compiled expressions in a static cache somewhere, so we would compile them just once and have both import time and runtime fast ?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 11, 2011

    New changeset ed0bc92fed68 by Antoine Pitrou in branch 'default':
    Use a dict for faster sysconfig startup (issue bpo-13150)
    http://hg.python.org/cpython/rev/ed0bc92fed68

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 11, 2011

    I am curious: wouldn't be a way of keeping the compiled expressions in
    a static cache somewhere, so we would compile them just once and have
    both import time and runtime fast ?

    Runtime shouldn't be affected. The re module has its own LRU caching.

    That said, it seems regular expressions are pickleable:

    b'\x80\x03cre\n_compile\nq\x00X\x00\x00\x00\x00q\x01K \x86q\x02Rq\x03.'

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 11, 2011

    Arg damn roundup e-mail gateway.
    I wanted to paste:

    >>> pickle.dumps(re.compile(''))
    b'\x80\x03cre\n_compile\nq\x00X\x00\x00\x00\x00q\x01K \x86q\x02Rq\x03.'

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 11, 2011

    Pre-parsing and building a cached module of built-time variables (from Makefile and pyconfig.h) under POSIX also removes more than 15% of startup time. Patch attached.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Oct 11, 2011

    bpo-11454 is another case where pre-parsing and pickling the regular expressions in the email module may improve import time considerably.

    @merwok
    Copy link
    Member

    merwok commented Oct 12, 2011

    bpo-9878 should also help with start-up time.

    @merwok
    Copy link
    Member

    merwok commented Oct 12, 2011

    Actually, bpo-9878 should supersede this bug: it proposes to generate a C module to avoid parsing Makefile and pyconfig.h, and your patch proposes to generate a Python module with the same goal.

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 12, 2011

    Actually, bpo-9878 should supersede this bug: it proposes to generate a C
    module to avoid parsing Makefile and pyconfig.h, and your patch
    proposes to generate a Python module with the same goal.

    Well, bpo-9878 doesn't have a patch, but perhaps Barry is willing to work on one. Also, if we have a pure Python solution, perhaps a C module isn't needed. The main advantage of the C solution, though, would be to avoid dubious parsing altogher, even at build time.

    @terryjreedy
    Copy link
    Member

    Since bpo-9878 proposes an *alternate* solution to *part* of the sysconfig problem, I disagree with 'supersede'. A Python solution would be more useful for other implementations if enough of the sysconfig info is not CPython specific.

    A CPython design feature is that it parses and compiles Python code just once per run, and imported modules just once until the code changes (or might have). For functions, everything possible is put into a behind-the-scenes code object. So even inner functions are parsed and compiled just once.

    The problem with sysconfig, it appears, is that lacks the equivalent design feature but instead does the equivalent of re-parsing and re-compiling inner functions with each outer function call.

    @merwok
    Copy link
    Member

    merwok commented Oct 13, 2011

    Since bpo-9878 proposes an *alternate* solution to *part* of the
    sysconfig problem, I disagree with 'supersede'.
    It’s also an older issue.

    A Python solution would be more useful for other implementations
    if enough of the sysconfig info is not CPython specific.
    That’s the point: the info currently parsed at runtime by sysconfig is specific to CPython (Makefile and pyconfig.h), so adding a CPython-specific C module was thought the way to go.

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 13, 2011

    > A Python solution would be more useful for other implementations
    > if enough of the sysconfig info is not CPython specific.
    That’s the point: the info currently parsed at runtime by sysconfig is
    specific to CPython (Makefile and pyconfig.h), so adding a
    CPython-specific C module was thought the way to go.

    A module doesn't have to be written in C to be CPython-specific.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 18, 2011

    New changeset 70160b53117f by Antoine Pitrou in branch 'default':
    Issue bpo-13150: sysconfig no longer parses the Makefile and config.h files
    http://hg.python.org/cpython/rev/70160b53117f

    @pitrou
    Copy link
    Member Author

    pitrou commented Oct 18, 2011

    Done! If someone wants to give life to the C approach, they are welcome :)

    @pitrou pitrou closed this as completed Oct 18, 2011
    @merwok
    Copy link
    Member

    merwok commented Oct 18, 2011

    BTW, distutils2 backports the sysconfig module and cfg file from 3.3, so now the two versions will diverge.

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Oct 18, 2011

    10x for solution, 10x for commit .
    Good bye cross compilation!
    Any attempt to improve python build system to support cross-build, multilib build, build outside source tree with different options is useless.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 19, 2011

    New changeset 677e625e2ef1 by Victor Stinner in branch 'default':
    Issue bpo-13150: Add a comment in _sysconfigdata to explain the origin of this file
    http://hg.python.org/cpython/rev/677e625e2ef1

    @doko42
    Copy link
    Member

    doko42 commented Jul 12, 2012

    the current ability to cross-build python now relies on being able to run the build python with the host library, using the _sysconfigdata.py from the host.

    if somebody decides to implement _sysconfigdata as a C extension, please ensure that this information still can be passed to the build python.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 21, 2013

    New changeset 66e30c4870bb by doko in branch '2.7':

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 22, 2013

    New changeset d174cb3f5b9e by Benjamin Peterson in branch '2.7':
    backout 66e30c4870bb for breaking OSX (bpo-13150)
    http://hg.python.org/cpython/rev/d174cb3f5b9e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 8, 2013

    New changeset be3b4aa2ad28 by doko in branch '2.7':

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants