classification
Title: locale can be imported at startup but relies on too many library modules
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, barry, benjamin.peterson, brett.cannon, christian.heimes, eric.araujo, eric.snow, flox, haypo, lemburg, mark.dickinson, ncoghlan, orsenthil, pitrou, python-dev, rhettinger
Priority: normal Keywords: patch

Created on 2010-08-09 19:09 by pitrou, last changed 2013-10-12 16:05 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
bootlocale.patch pitrou, 2010-08-09 19:56
bootlocale2.patch pitrou, 2013-10-09 21:29 review
bootlocale3.patch haypo, 2013-10-11 21:58 review
Messages (29)
msg113457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 19:09
Consider the two following commands, from an SVN working copy:

$ ./python -c 'import heapq; print(heapq.heapify)'
<built-in function heapify>

$ cat | ./python -c 'import heapq; print(heapq.heapify)'
<function heapify at 0x7f5d456025a0>

In the second case, the "from _heapq import *" in heapq fails silently. The reason was a bit difficult to find, but it goes as following:
- when run from a non-pty, initializing the standard streams imports Lib/locale.py
- Lib/locale.py import collections which import heapq
- heapq tries to import _heapq but, at this point, the build dir (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path, since site.py does it: the import fails
- heapq silences the ImportError, and this early loaded copy of the module is then made available to all subsequent consumers of heapq, without C accelerators

The issue might seem rather exotic, but it is a more general symptom of the problem that importing Lib/locale.py when initializing std{in,out,err} brings too many import dependencies.

A possible solution is to define a subset of locale.py that is used for bootstrap and rely on as little modules as possible.
msg113459 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 19:41
Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends.
msg113462 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 19:56
Updated patch also replaces imports of locale in site.py (since it can be imported early on startup).
msg113466 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 20:27
Chances are that the patch will also fix the test_heapq failures on some buildbots (see e.g. http://www.python.org/dev/buildbot/3.x/builders/x86%20Tiger%203.x/builds/759/steps/test/logs/stdio ).
msg113476 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-09 21:10
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends.

I don't think that's the right way forward.

It's much easier to either remove the need to import those
extra modules or defer their import to actual use within
a function.

Both collections and functools are used in two distinct places.
The re module use could also be deferred to actual use.

It would be worthwhile adding a note to the top of the module
mentioning the fact that the module is loaded early on by Python
and to warn about use of non-builtin module imports.
msg113481 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 21:30
> I don't think that's the right way forward.
> 
> It's much easier to either remove the need to import those
> extra modules or defer their import to actual use within
> a function.

The latter has various issues, such as being overly tedious and
potentially risk (because of the import lock).
Furthermore, functools.wraps is used as a decorator at the toplevel of
locale.py, which makes it currently unremovable.

The former is easier said than done, and we may end up writing poor
man's reimplementations of some stdlib functions.

> It would be worthwhile adding a note to the top of the module
> mentioning the fact that the module is loaded early on by Python
> and to warn about use of non-builtin module imports.

Use of non-builtin stdlib modules is legitimate for many functions which
aren't used in the startup process anyway. The rationale behind the
patch is precisely to define a subset that is needed at startup and
shouldn't rely on stdlib facilities. The rest of the locale module is
allowed to use whatever stdlib facilities it desires, which is always
much more practical.

(note, this is already done elsewhere, for example _abcoll.py or
_weakrefset.py)
msg113488 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-09 22:04
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> I don't think that's the right way forward.
>>
>> It's much easier to either remove the need to import those
>> extra modules or defer their import to actual use within
>> a function.
> 
> The latter has various issues, such as being overly tedious and
> potentially risk (because of the import lock).

Really ? collections is only used in format_string and can easily
be imported there. Note that putting imports into functions is a technique
to remove problems with import locks - it doesn't create any.

> Furthermore, functools.wraps is used as a decorator at the toplevel of
> locale.py, which makes it currently unremovable.

If you look at the code, that decorator is not needed at all. Furthermore
the whole complication was just introduced to make some tests easier
to write.

> The former is easier said than done, and we may end up writing poor
> man's reimplementations of some stdlib functions.

Please be more specific. The modules that have an impact
on import time are collections, functools and re.

>> It would be worthwhile adding a note to the top of the module
>> mentioning the fact that the module is loaded early on by Python
>> and to warn about use of non-builtin module imports.
> 
> Use of non-builtin stdlib modules is legitimate for many functions which
> aren't used in the startup process anyway. The rationale behind the
> patch is precisely to define a subset that is needed at startup and
> shouldn't rely on stdlib facilities. The rest of the locale module is
> allowed to use whatever stdlib facilities it desires, which is always
> much more practical.

I said "warn about the use of non-builtin modules", not disallow
their use. AFAIK, the purpose of the exercise is to reduce the
number of non-builtin modules being loaded early on during
Python startup.

> (note, this is already done elsewhere, for example _abcoll.py or
> _weakrefset.py)

Those are special cases since they are needed for ABCs. locale
is not special enough to warrant such a disruption in module
organization and it's not needed either (see above).
msg113492 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-09 22:19
> I said "warn about the use of non-builtin modules", not disallow
> their use. AFAIK, the purpose of the exercise is to reduce the
> number of non-builtin modules being loaded early on during
> Python startup.

The purpose of the exercise is to avoid, as much as possible, the
numerous issues that have already been encountered because of the
startup procedure relying on locale in various conditions for
initialization of the standard streams.

You are proposing that we commit a one-time fix to fix the current
situation. Recent history shows that similar problems will arise again,
and waste useful developer time for committing urgency fixes in order to
fix the build (not to mention that the symptoms are sometimes difficult
to diagnose, such as in this very issue). I am proposing a broader
change which prevents, as much as possible, similar problems from
reappearing and will relieve us from a dispensable burden.

> > (note, this is already done elsewhere, for example _abcoll.py or
> > _weakrefset.py)
> 
> Those are special cases since they are needed for ABCs. locale
> is not special enough to warrant such a disruption in module
> organization and it's not needed either (see above).

"special enough" is in the eye of the beholder.
msg113499 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-08-09 23:44
Just to comment on the import lock issue when importing from within a function, it is a problem. If an import triggers the launching of a thread which calls these functions that have an import inside of them, deadlock will occur. This has been reported as an issue before so people do it (unfortunately).

Importing within functions is something to be avoided when possible.
msg113510 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-10 09:26
> heapq tries to import _heapq but, at this point, the build dir 
> (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path

The problem only exists for developers, not for an installation copy of Python?

Another approach is to call required site code earlier (eg. rewrite it in C and execute it before loading the locale module).
msg113511 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-10 09:32
> > heapq tries to import _heapq but, at this point, the build dir 
> > (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path
> 
> The problem only exists for developers, not for an installation copy
> of Python?

This particular problem indeed (for developers and for buildbots - see
the weird test_heapq failures on some OS X buildbots).

> Another approach is to call required site code earlier (eg. rewrite it
> in C and execute it before loading the locale module).

Indeed, but since it calls sysconfig.get_platform(), I'm not sure how
much code would need to be rewritten in C.
msg113516 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-10 09:44
> Indeed, but since it calls sysconfig.get_platform(), I'm not sure how
> much code would need to be rewritten in C.

Oh, the function is prefixed by the following comment:

# XXX This should not be part of site.py, since it is needed even when
# using the -S option for Python.  See http://www.python.org/sf/586680
def addbuilddir():

Issue #586680 was closed as wontfix.

--

Oh yes, sysconfig.get_platform() is complex :-/

Brett wrote "If we think once can reliably add the directory based purely on whether it starts with "build/lib.", and ..." (msg73411).
msg113591 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-11 08:07
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> I said "warn about the use of non-builtin modules", not disallow
>> their use. AFAIK, the purpose of the exercise is to reduce the
>> number of non-builtin modules being loaded early on during
>> Python startup.
> 
> The purpose of the exercise is to avoid, as much as possible, the
> numerous issues that have already been encountered because of the
> startup procedure relying on locale in various conditions for
> initialization of the standard streams.
> 
> You are proposing that we commit a one-time fix to fix the current
> situation. Recent history shows that similar problems will arise again,
> and waste useful developer time for committing urgency fixes in order to
> fix the build (not to mention that the symptoms are sometimes difficult
> to diagnose, such as in this very issue). I am proposing a broader
> change which prevents, as much as possible, similar problems from
> reappearing and will relieve us from a dispensable burden.

No, what I'm proposing is to make "import locale" safe during
boot time. By separating out some functions into a separate
module which is then supposed to be used by the boot process,
you don't really solve the problem. Other code in the boot process
may very well still import the main locale module and thus
cause the same problems you were trying to solve by separating
out the problematic code - even if this is just code that gets
run via sitecustomize.py or other such hooks.

The changes I'm suggesting will be beneficial to all
standard uses of the module without any such workarounds
based on conventions.
msg113596 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-11 09:15
> No, what I'm proposing is to make "import locale" safe during
> boot time. By separating out some functions into a separate
> module which is then supposed to be used by the boot process,
> you don't really solve the problem.

I do, and my experimentations show it.

Believe me, variations of this problem have been bothering us often
enough in recent times that I know your solution won't work for long.

"Trying to be careful with imports in a large stdlib module" doesn't cut
it, because we always need new imports when we make changes. The point
of _bootlocale (you might not like the name, in which case you can
suggest an alternative) is that it is restricted and static: we only
need getpreferredencoding() and its dependencies, and this code isn't
likely to change a lot.

(you might ask why this problem hasn't affected 2.x, and that's because
in 2.x standard streams are much simpler, built-in objects; in
particular, they don't need to choose an encoding for character
decoding; their initialization doesn't require executing stdlib code)

> Other code in the boot process
> may very well still import the main locale module

It doesn't. 
By "boot process" I really mean something very specific. It is all which
runs until site.py gets executed (if it isn't skipped). There isn't a
whole lot of code there. Mostly, it's initialization of standard
streams, where two stdlib functions can be invoked: os.device_encoding()
and locale.getpreferredencoding() (depending on the circumstances).

When sitecustomize.py gets run, everything is already set up and there's
no problem importing whatever module you want.

> The changes I'm suggesting will be beneficial to all
> standard uses of the module without any such workarounds
> based on conventions.

Standard uses of the module aren't problematic at all, and importing
functools or collections in that context is harmless (they will probably
be imported by other stdlib modules anyway).
msg113878 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-14 07:49
As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python __import__ implementation).

I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies.

The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py.
msg113880 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-14 09:40
Antoine fixed #9589 by rewriting site.py code in C and calling it more much earlier: r83988.

This commit fixes the initial problem of this issue:

$ ./python -c 'import heapq; print(heapq.heapify)'
<built-in function heapify>
$ cat | ./python -c 'import heapq; print(heapq.heapify)'
<built-in function heapify>

Can we close this issue, or do you consider that it is still very important to not load too much modules at startup?
msg113912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-14 17:06
> Can we close this issue, or do you consider that it is still very
> important to not load too much modules at startup?

The patch eases the bootstrap constraints by creating a
bootstrap-friendly subset of locale.py. It is indeed much less pressing
now that the initial bug has been fixed.

It also depends whether it is important to have a lightweight
executable. If it isn't, I guess we could compile all extension modules
statically.
msg113936 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-14 23:00
Nick Coghlan wrote:
> 
> Nick Coghlan <ncoghlan@gmail.com> added the comment:
> 
> As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python __import__ implementation).
> 
> I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies.
> 
> The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py.

I don't object to such strategies in general, but in this case,
a change of this size is neither required nor helpful, so remain
a firm -1 on the patch.

The locale module already uses a C helper module. By now adding
another indirection via a boot-time only Python extract, you complicate
the setup - and what's worse: it doesn't solve the problem, since some
other module you import at boot time may still import locale directly
and for the same reason the collection module got used in locale:
programmers being unaware of the implications this has.
msg160940 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-05-16 22:48
> initializing the standard streams imports Lib/locale.py

It looks like only locale.getpreferredencoding() is needed to initialize standard streams. Another option is to add a locale encoding codec. I already proposed the idea in #13619 but it was rejected.
msg199346 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-09 21:29
Updated patch for 3.4.
msg199364 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-10 06:29
+1 This seems like a reasonable solution.
msg199367 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-10 07:58
The io module doesn't need to set temporarly the LC_CTYPE locale (which is a good thing because the change is process-wide!). If we ignore systems where CODESET is not available, the _bootlocale can be simplified to a few lines:

if sys.platform.startswith("win"):
    def getpreferredencoding():
        import _locale
        return _locale._getdefaultlocale()[1]
else:
    def getpreferredencoding():
        result = nl_langinfo(CODESET)
        if not result and sys.platform == 'darwin':
             result = 'UTF-8'
        return result

This code can probably be implemented in C, directly in the _locale module. Would it be acceptable to modify the io module to replace locale.getpreferredencoding(False) with _locale.getpreferredencoding(False)?

Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET? Would it be acceptable to fallback to locale.py if CODESET is not available?
msg199507 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-11 21:30
The locale module uses only collections.abc.Mapping. The import of the entire collections module can be avoided if collections.abc is renamed to _abcoll again. functools.wrap() can be replaced with:

localeconv.__doc__ = _localeconv.__doc__

The other attributes are either equal (e.g. __name__) or do not exist on builtin functions (__dict__).
msg199510 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-11 21:58
> ...  the _bootlocale can be simplified to a few lines: ...

Here is the patch implementing my proposition: bootlocale3.patch.
msg199511 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-10-11 22:13
"Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET?"

I checked my VMs with Python, nl_langinfo(CODESET) works on:

- Linux (Fedora 18, kernel 3.9)
- OpenBSD 5.2
- OpenIndiana 148 (SunOS 5.11)
- FreeBSD 9.1

I also tested my patch on Windows 7: _bootlocale.getpreferredencoding() works as expected (it returns "cp1252").
msg199512 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-11 22:14
You could raise an error and wait until somebody files a complain. :)
msg199513 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-11 22:15
New changeset fbbf8b160e8d by Antoine Pitrou in branch 'default':
Issue #9548: Add a minimal "_bootlocale" module that is imported by the _io module instead of the full locale module.
http://hg.python.org/cpython/rev/fbbf8b160e8d
msg199575 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-10-12 15:25
Just a quick favour to ask people: please post benchmark numbers of startup_nosite and normal_startup with your patches, otherwise we are taking stabs in the dark that the code complexity being suggested is worth it.
msg199581 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-12 16:05
Here normal_startup and startup_nosite wouldn't show a difference (under Linux anyway) because the locale module is only imported for non-interactive streams, AFAICT.
History
Date User Action Args
2013-10-12 16:05:51pitrousetstatus: open -> closed
resolution: fixed
messages: + msg199581

stage: patch review -> resolved
2013-10-12 15:25:07brett.cannonsetmessages: + msg199575
2013-10-11 22:15:25python-devsetnosy: + python-dev
messages: + msg199513
2013-10-11 22:14:58christian.heimessetmessages: + msg199512
2013-10-11 22:13:20hayposetmessages: + msg199511
2013-10-11 21:58:45hayposetfiles: + bootlocale3.patch

messages: + msg199510
2013-10-11 21:30:24christian.heimessetmessages: + msg199507
2013-10-10 07:58:25hayposetmessages: + msg199367
2013-10-10 06:29:42rhettingersetmessages: + msg199364
2013-10-10 00:00:59eric.snowsetnosy: + eric.snow
2013-10-09 21:29:38pitrousetfiles: + bootlocale2.patch
2013-10-09 21:29:24pitrousetnosy: + christian.heimes, benjamin.peterson

messages: + msg199346
versions: + Python 3.4, - Python 3.3
2013-10-09 17:17:50barrysetnosy: + barry
2012-10-02 18:39:42pitrouunlinkissue16101 dependencies
2012-10-02 13:08:14brett.cannonlinkissue16101 dependencies
2012-07-22 10:12:16floxsetnosy: + flox
2012-05-17 17:03:39Arfreversetnosy: + Arfrever
2012-05-16 22:48:20hayposetmessages: + msg160940
2012-05-16 16:16:42pitrousetstage: patch review
versions: + Python 3.3, - Python 3.2
2010-08-14 23:10:55eric.araujosetnosy: + eric.araujo
2010-08-14 23:00:37lemburgsetmessages: + msg113936
2010-08-14 17:06:03pitrousetmessages: + msg113912
2010-08-14 09:40:43hayposetmessages: + msg113880
2010-08-14 07:49:32ncoghlansetmessages: + msg113878
2010-08-13 16:22:39pitrousetnosy: + ncoghlan
2010-08-11 09:16:01pitrousetmessages: + msg113596
2010-08-11 08:07:39lemburgsetmessages: + msg113591
2010-08-10 09:44:49hayposetmessages: + msg113516
2010-08-10 09:32:46pitrousetmessages: + msg113511
2010-08-10 09:26:58hayposetmessages: + msg113510
2010-08-09 23:44:39brett.cannonsetmessages: + msg113499
2010-08-09 22:19:41pitrousetmessages: + msg113492
2010-08-09 22:04:44lemburgsetmessages: + msg113488
2010-08-09 21:30:22pitrousetmessages: + msg113481
2010-08-09 21:10:46lemburgsetmessages: + msg113476
title: locale can be imported at startup but relies on too many library modules -> locale can be imported at startup but relies on too many library modules
2010-08-09 21:03:11brett.cannonsetnosy: + brett.cannon
2010-08-09 20:27:28pitrousetmessages: + msg113466
2010-08-09 19:57:37pitrousetfiles: - bootlocale.patch
2010-08-09 19:56:18pitrousetfiles: + bootlocale.patch

messages: + msg113462
2010-08-09 19:42:21mark.dickinsonsetnosy: + mark.dickinson
2010-08-09 19:41:29pitrousetfiles: + bootlocale.patch
keywords: + patch
messages: + msg113459
2010-08-09 19:33:19pitrousetnosy: + lemburg, rhettinger, orsenthil
2010-08-09 19:09:29pitroucreate