Author wolma
Recipients ethan.furman, gvanrossum, ned.deily, python-dev, vstinner, wolma
Date 2016-11-08.12:08:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1478606904.2.0.389274801192.issue28637@psf.upfronthosting.co.za>
In-reply-to
Content
STINNER Victor added the comment:
>BUT when Python is started from a virtual environment (created by the
>"venv" module), the re module is important by default.
>
>haypo@speed-python$ venv/bin/python3 -c 'import sys; print("re" in sys.modules)'
>True

Exciting, I just verified that this is true and running python3 from a venv really seems to be the only situation, in which the re module gets imported during startup (at least it's only this one branch in site.py that uses it).

If adding a single enum import to re causes such a big startup time difference I wonder how much more could be gained for the venv case by not importing re at all!

Turns out that the complete code block in site.py that is used by venvs and that was partially shown by @haypo is:

CONFIG_LINE = r'^(?P<key>(\w|[-_])+)\s*=\s*(?P<value>.*)\s*$'

def venv(known_paths):
    global PREFIXES, ENABLE_USER_SITE

    env = os.environ
    if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
        executable = os.environ['__PYVENV_LAUNCHER__']
    else:
        executable = sys.executable
    exe_dir, _ = os.path.split(os.path.abspath(executable))
    site_prefix = os.path.dirname(exe_dir)
    sys._home = None
    conf_basename = 'pyvenv.cfg'
    candidate_confs = [
        conffile for conffile in (
            os.path.join(exe_dir, conf_basename),
            os.path.join(site_prefix, conf_basename)
            )
        if os.path.isfile(conffile)
        ]

    if candidate_confs:
        import re
        config_line = re.compile(CONFIG_LINE)
        virtual_conf = candidate_confs[0]
        system_site = "true"
        # Issue 25185: Use UTF-8, as that's what the venv module uses when
        # writing the file.
        with open(virtual_conf, encoding='utf-8') as f:
            for line in f:
                line = line.strip()
                m = config_line.match(line)
                if m:
                    d = m.groupdict()
                    key, value = d['key'].lower(), d['value']
                    if key == 'include-system-site-packages':
                        system_site = value.lower()
                    elif key == 'home':
                        sys._home = value

        sys.prefix = sys.exec_prefix = site_prefix

        # Doing this here ensures venv takes precedence over user-site
        addsitepackages(known_paths, [sys.prefix])

        # addsitepackages will process site_prefix again if its in PREFIXES,
        # but that's ok; known_paths will prevent anything being added twice
        if system_site == "true":
            PREFIXES.insert(0, sys.prefix)
        else:
            PREFIXES = [sys.prefix]
            ENABLE_USER_SITE = False

    return known_paths

So all the re module is good for here is to parse simple config file records with key/value pairs separated by '='. ┬┤Shouldn't it be straightforward to implement that logic right inside that block directly without requiring a giant import?

This should easily be doable for 3.6 still, seems as if it would solve the whole issue and probably speed up the performance tests much more than any reverted changesets could.

What do you think?
History
Date User Action Args
2016-11-08 12:08:24wolmasetrecipients: + wolma, gvanrossum, vstinner, ned.deily, ethan.furman, python-dev
2016-11-08 12:08:24wolmasetmessageid: <1478606904.2.0.389274801192.issue28637@psf.upfronthosting.co.za>
2016-11-08 12:08:24wolmalinkissue28637 messages
2016-11-08 12:08:23wolmacreate