Issue9548
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-08-09 19:09 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
bootlocale.patch | pitrou, 2010-08-09 19:56 | |||
bootlocale2.patch | pitrou, 2013-10-09 21:29 | review | ||
bootlocale3.patch | vstinner, 2013-10-11 21:58 | review |
Messages (29) | |||
---|---|---|---|
msg113457 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 19:09 | |
Consider the two following commands, from an SVN working copy: $ ./python -c 'import heapq; print(heapq.heapify)' <built-in function heapify> $ cat | ./python -c 'import heapq; print(heapq.heapify)' <function heapify at 0x7f5d456025a0> In the second case, the "from _heapq import *" in heapq fails silently. The reason was a bit difficult to find, but it goes as following: - when run from a non-pty, initializing the standard streams imports Lib/locale.py - Lib/locale.py import collections which import heapq - heapq tries to import _heapq but, at this point, the build dir (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path, since site.py does it: the import fails - heapq silences the ImportError, and this early loaded copy of the module is then made available to all subsequent consumers of heapq, without C accelerators The issue might seem rather exotic, but it is a more general symptom of the problem that importing Lib/locale.py when initializing std{in,out,err} brings too many import dependencies. A possible solution is to define a subset of locale.py that is used for bootstrap and rely on as little modules as possible. |
|||
msg113459 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 19:41 | |
Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends. |
|||
msg113462 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 19:56 | |
Updated patch also replaces imports of locale in site.py (since it can be imported early on startup). |
|||
msg113466 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 20:27 | |
Chances are that the patch will also fix the test_heapq failures on some buildbots (see e.g. http://www.python.org/dev/buildbot/3.x/builders/x86%20Tiger%203.x/builds/759/steps/test/logs/stdio ). |
|||
msg113476 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-09 21:10 | |
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends. I don't think that's the right way forward. It's much easier to either remove the need to import those extra modules or defer their import to actual use within a function. Both collections and functools are used in two distinct places. The re module use could also be deferred to actual use. It would be worthwhile adding a note to the top of the module mentioning the fact that the module is loaded early on by Python and to warn about use of non-builtin module imports. |
|||
msg113481 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 21:30 | |
> I don't think that's the right way forward. > > It's much easier to either remove the need to import those > extra modules or defer their import to actual use within > a function. The latter has various issues, such as being overly tedious and potentially risk (because of the import lock). Furthermore, functools.wraps is used as a decorator at the toplevel of locale.py, which makes it currently unremovable. The former is easier said than done, and we may end up writing poor man's reimplementations of some stdlib functions. > It would be worthwhile adding a note to the top of the module > mentioning the fact that the module is loaded early on by Python > and to warn about use of non-builtin module imports. Use of non-builtin stdlib modules is legitimate for many functions which aren't used in the startup process anyway. The rationale behind the patch is precisely to define a subset that is needed at startup and shouldn't rely on stdlib facilities. The rest of the locale module is allowed to use whatever stdlib facilities it desires, which is always much more practical. (note, this is already done elsewhere, for example _abcoll.py or _weakrefset.py) |
|||
msg113488 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-09 22:04 | |
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> I don't think that's the right way forward. >> >> It's much easier to either remove the need to import those >> extra modules or defer their import to actual use within >> a function. > > The latter has various issues, such as being overly tedious and > potentially risk (because of the import lock). Really ? collections is only used in format_string and can easily be imported there. Note that putting imports into functions is a technique to remove problems with import locks - it doesn't create any. > Furthermore, functools.wraps is used as a decorator at the toplevel of > locale.py, which makes it currently unremovable. If you look at the code, that decorator is not needed at all. Furthermore the whole complication was just introduced to make some tests easier to write. > The former is easier said than done, and we may end up writing poor > man's reimplementations of some stdlib functions. Please be more specific. The modules that have an impact on import time are collections, functools and re. >> It would be worthwhile adding a note to the top of the module >> mentioning the fact that the module is loaded early on by Python >> and to warn about use of non-builtin module imports. > > Use of non-builtin stdlib modules is legitimate for many functions which > aren't used in the startup process anyway. The rationale behind the > patch is precisely to define a subset that is needed at startup and > shouldn't rely on stdlib facilities. The rest of the locale module is > allowed to use whatever stdlib facilities it desires, which is always > much more practical. I said "warn about the use of non-builtin modules", not disallow their use. AFAIK, the purpose of the exercise is to reduce the number of non-builtin modules being loaded early on during Python startup. > (note, this is already done elsewhere, for example _abcoll.py or > _weakrefset.py) Those are special cases since they are needed for ABCs. locale is not special enough to warrant such a disruption in module organization and it's not needed either (see above). |
|||
msg113492 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-09 22:19 | |
> I said "warn about the use of non-builtin modules", not disallow > their use. AFAIK, the purpose of the exercise is to reduce the > number of non-builtin modules being loaded early on during > Python startup. The purpose of the exercise is to avoid, as much as possible, the numerous issues that have already been encountered because of the startup procedure relying on locale in various conditions for initialization of the standard streams. You are proposing that we commit a one-time fix to fix the current situation. Recent history shows that similar problems will arise again, and waste useful developer time for committing urgency fixes in order to fix the build (not to mention that the symptoms are sometimes difficult to diagnose, such as in this very issue). I am proposing a broader change which prevents, as much as possible, similar problems from reappearing and will relieve us from a dispensable burden. > > (note, this is already done elsewhere, for example _abcoll.py or > > _weakrefset.py) > > Those are special cases since they are needed for ABCs. locale > is not special enough to warrant such a disruption in module > organization and it's not needed either (see above). "special enough" is in the eye of the beholder. |
|||
msg113499 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2010-08-09 23:44 | |
Just to comment on the import lock issue when importing from within a function, it is a problem. If an import triggers the launching of a thread which calls these functions that have an import inside of them, deadlock will occur. This has been reported as an issue before so people do it (unfortunately). Importing within functions is something to be avoided when possible. |
|||
msg113510 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-10 09:26 | |
> heapq tries to import _heapq but, at this point, the build dir > (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path The problem only exists for developers, not for an installation copy of Python? Another approach is to call required site code earlier (eg. rewrite it in C and execute it before loading the locale module). |
|||
msg113511 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-10 09:32 | |
> > heapq tries to import _heapq but, at this point, the build dir > > (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path > > The problem only exists for developers, not for an installation copy > of Python? This particular problem indeed (for developers and for buildbots - see the weird test_heapq failures on some OS X buildbots). > Another approach is to call required site code earlier (eg. rewrite it > in C and execute it before loading the locale module). Indeed, but since it calls sysconfig.get_platform(), I'm not sure how much code would need to be rewritten in C. |
|||
msg113516 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-10 09:44 | |
> Indeed, but since it calls sysconfig.get_platform(), I'm not sure how > much code would need to be rewritten in C. Oh, the function is prefixed by the following comment: # XXX This should not be part of site.py, since it is needed even when # using the -S option for Python. See http://www.python.org/sf/586680 def addbuilddir(): Issue #586680 was closed as wontfix. -- Oh yes, sysconfig.get_platform() is complex :-/ Brett wrote "If we think once can reliably add the directory based purely on whether it starts with "build/lib.", and ..." (msg73411). |
|||
msg113591 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-11 08:07 | |
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> I said "warn about the use of non-builtin modules", not disallow >> their use. AFAIK, the purpose of the exercise is to reduce the >> number of non-builtin modules being loaded early on during >> Python startup. > > The purpose of the exercise is to avoid, as much as possible, the > numerous issues that have already been encountered because of the > startup procedure relying on locale in various conditions for > initialization of the standard streams. > > You are proposing that we commit a one-time fix to fix the current > situation. Recent history shows that similar problems will arise again, > and waste useful developer time for committing urgency fixes in order to > fix the build (not to mention that the symptoms are sometimes difficult > to diagnose, such as in this very issue). I am proposing a broader > change which prevents, as much as possible, similar problems from > reappearing and will relieve us from a dispensable burden. No, what I'm proposing is to make "import locale" safe during boot time. By separating out some functions into a separate module which is then supposed to be used by the boot process, you don't really solve the problem. Other code in the boot process may very well still import the main locale module and thus cause the same problems you were trying to solve by separating out the problematic code - even if this is just code that gets run via sitecustomize.py or other such hooks. The changes I'm suggesting will be beneficial to all standard uses of the module without any such workarounds based on conventions. |
|||
msg113596 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-11 09:15 | |
> No, what I'm proposing is to make "import locale" safe during > boot time. By separating out some functions into a separate > module which is then supposed to be used by the boot process, > you don't really solve the problem. I do, and my experimentations show it. Believe me, variations of this problem have been bothering us often enough in recent times that I know your solution won't work for long. "Trying to be careful with imports in a large stdlib module" doesn't cut it, because we always need new imports when we make changes. The point of _bootlocale (you might not like the name, in which case you can suggest an alternative) is that it is restricted and static: we only need getpreferredencoding() and its dependencies, and this code isn't likely to change a lot. (you might ask why this problem hasn't affected 2.x, and that's because in 2.x standard streams are much simpler, built-in objects; in particular, they don't need to choose an encoding for character decoding; their initialization doesn't require executing stdlib code) > Other code in the boot process > may very well still import the main locale module It doesn't. By "boot process" I really mean something very specific. It is all which runs until site.py gets executed (if it isn't skipped). There isn't a whole lot of code there. Mostly, it's initialization of standard streams, where two stdlib functions can be invoked: os.device_encoding() and locale.getpreferredencoding() (depending on the circumstances). When sitecustomize.py gets run, everything is already set up and there's no problem importing whatever module you want. > The changes I'm suggesting will be beneficial to all > standard uses of the module without any such workarounds > based on conventions. Standard uses of the module aren't problematic at all, and importing functools or collections in that context is harmless (they will probably be imported by other stdlib modules anyway). |
|||
msg113878 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2010-08-14 07:49 | |
As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python __import__ implementation). I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies. The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py. |
|||
msg113880 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-14 09:40 | |
Antoine fixed #9589 by rewriting site.py code in C and calling it more much earlier: r83988. This commit fixes the initial problem of this issue: $ ./python -c 'import heapq; print(heapq.heapify)' <built-in function heapify> $ cat | ./python -c 'import heapq; print(heapq.heapify)' <built-in function heapify> Can we close this issue, or do you consider that it is still very important to not load too much modules at startup? |
|||
msg113912 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-14 17:06 | |
> Can we close this issue, or do you consider that it is still very > important to not load too much modules at startup? The patch eases the bootstrap constraints by creating a bootstrap-friendly subset of locale.py. It is indeed much less pressing now that the initial bug has been fixed. It also depends whether it is important to have a lightweight executable. If it isn't, I guess we could compile all extension modules statically. |
|||
msg113936 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-14 23:00 | |
Nick Coghlan wrote: > > Nick Coghlan <ncoghlan@gmail.com> added the comment: > > As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python __import__ implementation). > > I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies. > > The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py. I don't object to such strategies in general, but in this case, a change of this size is neither required nor helpful, so remain a firm -1 on the patch. The locale module already uses a C helper module. By now adding another indirection via a boot-time only Python extract, you complicate the setup - and what's worse: it doesn't solve the problem, since some other module you import at boot time may still import locale directly and for the same reason the collection module got used in locale: programmers being unaware of the implications this has. |
|||
msg160940 - (view) | Author: STINNER Victor (vstinner) * | Date: 2012-05-16 22:48 | |
> initializing the standard streams imports Lib/locale.py It looks like only locale.getpreferredencoding() is needed to initialize standard streams. Another option is to add a locale encoding codec. I already proposed the idea in #13619 but it was rejected. |
|||
msg199346 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-10-09 21:29 | |
Updated patch for 3.4. |
|||
msg199364 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2013-10-10 06:29 | |
+1 This seems like a reasonable solution. |
|||
msg199367 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-10-10 07:58 | |
The io module doesn't need to set temporarly the LC_CTYPE locale (which is a good thing because the change is process-wide!). If we ignore systems where CODESET is not available, the _bootlocale can be simplified to a few lines: if sys.platform.startswith("win"): def getpreferredencoding(): import _locale return _locale._getdefaultlocale()[1] else: def getpreferredencoding(): result = nl_langinfo(CODESET) if not result and sys.platform == 'darwin': result = 'UTF-8' return result This code can probably be implemented in C, directly in the _locale module. Would it be acceptable to modify the io module to replace locale.getpreferredencoding(False) with _locale.getpreferredencoding(False)? Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET? Would it be acceptable to fallback to locale.py if CODESET is not available? |
|||
msg199507 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2013-10-11 21:30 | |
The locale module uses only collections.abc.Mapping. The import of the entire collections module can be avoided if collections.abc is renamed to _abcoll again. functools.wrap() can be replaced with: localeconv.__doc__ = _localeconv.__doc__ The other attributes are either equal (e.g. __name__) or do not exist on builtin functions (__dict__). |
|||
msg199510 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-10-11 21:58 | |
> ... the _bootlocale can be simplified to a few lines: ... Here is the patch implementing my proposition: bootlocale3.patch. |
|||
msg199511 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-10-11 22:13 | |
"Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET?" I checked my VMs with Python, nl_langinfo(CODESET) works on: - Linux (Fedora 18, kernel 3.9) - OpenBSD 5.2 - OpenIndiana 148 (SunOS 5.11) - FreeBSD 9.1 I also tested my patch on Windows 7: _bootlocale.getpreferredencoding() works as expected (it returns "cp1252"). |
|||
msg199512 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2013-10-11 22:14 | |
You could raise an error and wait until somebody files a complain. :) |
|||
msg199513 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-10-11 22:15 | |
New changeset fbbf8b160e8d by Antoine Pitrou in branch 'default': Issue #9548: Add a minimal "_bootlocale" module that is imported by the _io module instead of the full locale module. http://hg.python.org/cpython/rev/fbbf8b160e8d |
|||
msg199575 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2013-10-12 15:25 | |
Just a quick favour to ask people: please post benchmark numbers of startup_nosite and normal_startup with your patches, otherwise we are taking stabs in the dark that the code complexity being suggested is worth it. |
|||
msg199581 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-10-12 16:05 | |
Here normal_startup and startup_nosite wouldn't show a difference (under Linux anyway) because the locale module is only imported for non-interactive streams, AFAICT. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:05 | admin | set | github: 53757 |
2013-10-12 16:05:51 | pitrou | set | status: open -> closed resolution: fixed messages: + msg199581 stage: patch review -> resolved |
2013-10-12 15:25:07 | brett.cannon | set | messages: + msg199575 |
2013-10-11 22:15:25 | python-dev | set | nosy:
+ python-dev messages: + msg199513 |
2013-10-11 22:14:58 | christian.heimes | set | messages: + msg199512 |
2013-10-11 22:13:20 | vstinner | set | messages: + msg199511 |
2013-10-11 21:58:45 | vstinner | set | files:
+ bootlocale3.patch messages: + msg199510 |
2013-10-11 21:30:24 | christian.heimes | set | messages: + msg199507 |
2013-10-10 07:58:25 | vstinner | set | messages: + msg199367 |
2013-10-10 06:29:42 | rhettinger | set | messages: + msg199364 |
2013-10-10 00:00:59 | eric.snow | set | nosy:
+ eric.snow |
2013-10-09 21:29:38 | pitrou | set | files: + bootlocale2.patch |
2013-10-09 21:29:24 | pitrou | set | nosy:
+ christian.heimes, benjamin.peterson messages: + msg199346 versions: + Python 3.4, - Python 3.3 |
2013-10-09 17:17:50 | barry | set | nosy:
+ barry |
2012-10-02 18:39:42 | pitrou | unlink | issue16101 dependencies |
2012-10-02 13:08:14 | brett.cannon | link | issue16101 dependencies |
2012-07-22 10:12:16 | flox | set | nosy:
+ flox |
2012-05-17 17:03:39 | Arfrever | set | nosy:
+ Arfrever |
2012-05-16 22:48:20 | vstinner | set | messages: + msg160940 |
2012-05-16 16:16:42 | pitrou | set | stage: patch review versions: + Python 3.3, - Python 3.2 |
2010-08-14 23:10:55 | eric.araujo | set | nosy:
+ eric.araujo |
2010-08-14 23:00:37 | lemburg | set | messages: + msg113936 |
2010-08-14 17:06:03 | pitrou | set | messages: + msg113912 |
2010-08-14 09:40:43 | vstinner | set | messages: + msg113880 |
2010-08-14 07:49:32 | ncoghlan | set | messages: + msg113878 |
2010-08-13 16:22:39 | pitrou | set | nosy:
+ ncoghlan |
2010-08-11 09:16:01 | pitrou | set | messages: + msg113596 |
2010-08-11 08:07:39 | lemburg | set | messages: + msg113591 |
2010-08-10 09:44:49 | vstinner | set | messages: + msg113516 |
2010-08-10 09:32:46 | pitrou | set | messages: + msg113511 |
2010-08-10 09:26:58 | vstinner | set | messages: + msg113510 |
2010-08-09 23:44:39 | brett.cannon | set | messages: + msg113499 |
2010-08-09 22:19:41 | pitrou | set | messages: + msg113492 |
2010-08-09 22:04:44 | lemburg | set | messages: + msg113488 |
2010-08-09 21:30:22 | pitrou | set | messages: + msg113481 |
2010-08-09 21:10:46 | lemburg | set | messages:
+ msg113476 title: locale can be imported at startup but relies on too many library modules -> locale can be imported at startup but relies on too many library modules |
2010-08-09 21:03:11 | brett.cannon | set | nosy:
+ brett.cannon |
2010-08-09 20:27:28 | pitrou | set | messages: + msg113466 |
2010-08-09 19:57:37 | pitrou | set | files: - bootlocale.patch |
2010-08-09 19:56:18 | pitrou | set | files:
+ bootlocale.patch messages: + msg113462 |
2010-08-09 19:42:21 | mark.dickinson | set | nosy:
+ mark.dickinson |
2010-08-09 19:41:29 | pitrou | set | files:
+ bootlocale.patch keywords: + patch messages: + msg113459 |
2010-08-09 19:33:19 | pitrou | set | nosy:
+ lemburg, rhettinger, orsenthil |
2010-08-09 19:09:29 | pitrou | create |