Issue39430
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-01-23 11:15 by Maciej Gol, last changed 2022-04-11 14:59 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
test.py | Maciej Gol, 2020-01-23 11:19 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 18161 | merged | serhiy.storchaka, 2020-01-24 09:06 | |
PR 18171 | merged | miss-islington, 2020-01-24 17:56 | |
PR 18172 | merged | miss-islington, 2020-01-24 17:56 |
Messages (18) | |||
---|---|---|---|
msg360551 - (view) | Author: Maciej Gol (Maciej Gol) | Date: 2020-01-23 11:15 | |
Hey guys, We have a component that archives and unarchives multiple files in separate threads that started to misbehave recently. We have noticed a bunch of `AttributeError: module 'lzma' has no attribute 'LZMAFile'` errors, which are unexpected because our python is not compiled with LZMA support. What is unfortunate, is that given the traceback: Traceback (most recent call last): File "test.py", line 18, in <module> list(pool.map(test_lzma, range(100))) File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator yield fs.pop().result() File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.__get_result() File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/opt/lang/python37/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "test.py", line 14, in test_lzma tarfile.open(fileobj=buf, mode="r") File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1573, in open return func(name, "r", fileobj, **kwargs) File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1699, in xzopen fileobj = lzma.LZMAFile(fileobj or name, mode, preset=preset) AttributeError: module 'lzma' has no attribute 'LZMAFile' the last line of the traceback is right AFTER this block (tarfile.py:1694): try: import lzma except ImportError: raise CompressionError("lzma module is not available") Importing lzma in ipython fails properly: In [2]: import lzma --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-2-c0255b54beb9> in <module> ----> 1 import lzma /opt/lang/python37/lib/python3.7/lzma.py in <module> 25 import io 26 import os ---> 27 from _lzma import * 28 from _lzma import _encode_filter_properties, _decode_filter_properties 29 import _compression ModuleNotFoundError: No module named '_lzma' When trying to debug the problem, we have noticed it's not deterministic. In order to reproduce it, we have created a test python that repeatedly writes an archive to BytesIO and then reads from it. Using it with 5 threads and 100 calls, gives very good chances of reproducing the issue. For us it was almost every time. Race condition occurs both on Python 3.7.3 and 3.7.6. Test script used to reproduce it attached. I know that the test script writes uncompressed archives and during opening tries to guess the compression. But I guess this is a legitimate scenario and should not matter in this case. |
|||
msg360552 - (view) | Author: Maciej Gol (Maciej Gol) | Date: 2020-01-23 11:19 | |
Uploading fixed file (the former had a typo) |
|||
msg360598 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-01-24 09:14 | |
PR 18161 fixes race condition by using "from ... import ..." which waits until the module be completely initialized if the specified names are not set. |
|||
msg360618 - (view) | Author: Maciej Gol (Maciej Gol) | Date: 2020-01-24 13:33 | |
> PR 18161 fixes race condition by using "from ... import ..." > which waits until the module be completely initialized if the specified > names are not set. Correct me if I'm wrong, but I think the behavior of 'import lzma' in this case (vulnerable to race conditions) is not as intended? Shouldn't we also fix the 'import' statement itself? In general, I understand that due to how dynamic Python is, it might not be possible to load every single name at import time using `import` and using `from x import y` brings more determinism (because we have a safeguard now). But, looking at the stacktrace from ipython the problem lies in a sequence of import statements, not dynamic python coding. Shouldn't the importing mechanism be more deterministic in such case? For sure, it should not return an empty module (this is the case when race condition occurs). I think a race condition caused by simply using `import` statements (not `from x import y`) is a big caveat in the statement itself and how python imports work. Haven't checked if the cause is isolated to how tarfile works, or works in general, though :-( |
|||
msg360619 - (view) | Author: Maciej Gol (Maciej Gol) | Date: 2020-01-24 13:34 | |
By the way, thanks a lot for the fix <3 |
|||
msg360620 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-01-24 14:08 | |
It is intended to support circular imports. Let foo.py contains "import bar" and bar.py contains "import foo". When you execute "import foo", the import machinery first creates an empty module foo, adds it to sys.modules, reads foo.py and executes it in the namespace of module foo. When the interpreter encounters "import bar" in foo.py, the import machinery creates an empty module bar, adds it to sys.modules, reads bar.py and executes it in the namespace of module bar. When the interpreter encounters "import foo" in bar.py, the import machinery takes the module foo from sys.modules. So you break an infinite cycle and can import modules with cyclic dependencies. You can argue that cyclic import does not look as a good practice, but actually it is pretty common case when you import a submodule in a package. If foo/__init__.py contains "from .bar import Bar", the foo module must be imported before you import foo.bar, but is not completely initialized at that time yet. |
|||
msg360629 - (view) | Author: Maciej Gol (Maciej Gol) | Date: 2020-01-24 16:50 | |
This is a HUGE eye opener! Didn't know of that 'import' vs 'from x import y' difference. Thanks a lot! Is it documented somewhere ? pt., 24 sty 2020, 15:08 użytkownik Serhiy Storchaka <report@bugs.python.org> napisał: > > Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment: > > It is intended to support circular imports. Let foo.py contains "import > bar" and bar.py contains "import foo". When you execute "import foo", the > import machinery first creates an empty module foo, adds it to sys.modules, > reads foo.py and executes it in the namespace of module foo. When the > interpreter encounters "import bar" in foo.py, the import machinery creates > an empty module bar, adds it to sys.modules, reads bar.py and executes it > in the namespace of module bar. When the interpreter encounters "import > foo" in bar.py, the import machinery takes the module foo from sys.modules. > So you break an infinite cycle and can import modules with cyclic > dependencies. > > You can argue that cyclic import does not look as a good practice, but > actually it is pretty common case when you import a submodule in a package. > If foo/__init__.py contains "from .bar import Bar", the foo module must be > imported before you import foo.bar, but is not completely initialized at > that time yet. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue39430> > _______________________________________ > |
|||
msg360634 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2020-01-24 17:50 | |
It's covered in the language reference where import semantics are explained. It isn't explicitly called out in higher-level docs that I'm aware of because it only really comes up in edge cases like importing in a thread which we actively discourage unless there's a compelling reason not to. |
|||
msg360635 - (view) | Author: Ethan Furman (ethan.furman) * | Date: 2020-01-24 17:51 | |
Thanks, Serhiy! I had no idea of those changes to import. This does raise the question, though, of whether accessing an empty module's attributes should either: - wait for the module to finish initializing; or - raise an exception immediately with better information ? It seems like this is a trap waiting for anybody using threaded code and on-demand imports. |
|||
msg360636 - (view) | Author: Ethan Furman (ethan.furman) * | Date: 2020-01-24 17:56 | |
New changeset 9017e0bd5e124ae6d2ed94b9e9cacb2e86270980 by Ethan Furman (Serhiy Storchaka) in branch 'master': bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161) https://github.com/python/cpython/commit/9017e0bd5e124ae6d2ed94b9e9cacb2e86270980 |
|||
msg360637 - (view) | Author: Ethan Furman (ethan.furman) * | Date: 2020-01-24 18:06 | |
Brett: ------ > It's covered in the language reference where import semantics are > explained. It isn't explicitly called out in higher-level docs that > I'm aware of because it only really comes up in edge cases like > importing in a thread which we actively discourage unless there's a > compelling reason not to. When was this change made? Assuming it is, in fact, a change. It seems like most of the stdlib could be called in a thread, so the practice of lazily importing a module wholesale might need to be audited and proactively fixed. Maybe that could be a newcomer's easy(ish) project? |
|||
msg360649 - (view) | Author: miss-islington (miss-islington) | Date: 2020-01-24 20:10 | |
New changeset ea4a61fec842c94107eef46e5030b89a086f94bb by Miss Islington (bot) in branch '3.8': bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161) https://github.com/python/cpython/commit/ea4a61fec842c94107eef46e5030b89a086f94bb |
|||
msg360650 - (view) | Author: miss-islington (miss-islington) | Date: 2020-01-24 20:10 | |
New changeset 1a274359283d3d1f4f60dd527843f72e0368caf3 by Miss Islington (bot) in branch '3.7': bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161) https://github.com/python/cpython/commit/1a274359283d3d1f4f60dd527843f72e0368caf3 |
|||
msg360653 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-01-24 20:25 | |
I suppose it fixed the bug. But I cannot confirm because I cannot reproduce the original bug. |
|||
msg360708 - (view) | Author: Xavier de Gaye (xdegaye) * | Date: 2020-01-25 20:20 | |
In msg360620 Serhiy wrote: > When the interpreter encounters "import foo" in bar.py, the import machinery takes the module foo from sys.modules. > So you break an infinite cycle and can import modules with cyclic dependencies. The following pdb session that is run when executing the foo.py script, shows that this is not quite accurate. When the interpreter encounters "import foo" in bar.py, the import machinery instead starts executing foo.py once again and detects that at line 9 of foo.py the 'import bar' statement does not have to be executed since the import machinery is already in the process of importing this module. So it is at that point that the infinite cycle is broken. Later on line 12 in foo.py the import machinery detects that the AttributeError is raised because of a circular import and warns that "partially initialized module 'bar' has no attribute 'BAR' (most likely due to a circular import)". $ cat -n foo.py 1 import pdb 2 debug = 1 3 4 # Prevent starting a new pdb session when foo is imported by bar. 5 if debug and not hasattr(pdb, 'is_pdb_started'): 6 pdb.Pdb(skip=['importlib*']).set_trace() 7 pdb.is_pdb_started = True 8 9 import bar 10 FOO = 'foo' 11 try: 12 print(bar.BAR) 13 except AttributeError as e: 14 # The exception is raised with an explicit reason when foo is imported by 15 # bar due to partially initialized module 'bar'. 16 print(e) $ cat -n bar.py 1 import foo 2 BAR = 'bar' 3 print(foo.FOO) $ python foo.py > /tmp/foo.py(7)<module>() -> pdb.is_pdb_started = True (Pdb) step > /tmp/foo.py(9)<module>() -> import bar (Pdb) step --Call-- > /tmp/bar.py(1)<module>() -> import foo (Pdb) step > /tmp/bar.py(1)<module>() -> import foo (Pdb) step --Call-- > /tmp/foo.py(1)<module>() -> import pdb (Pdb) step > /tmp/foo.py(1)<module>() -> import pdb (Pdb) step > /tmp/foo.py(2)<module>() -> debug = 1 (Pdb) step > /tmp/foo.py(5)<module>() -> if debug and not hasattr(pdb, 'is_pdb_started'): (Pdb) step > /tmp/foo.py(9)<module>() -> import bar (Pdb) step > /tmp/foo.py(10)<module>() -> FOO = 'foo' (Pdb) step > /tmp/foo.py(11)<module>() -> try: (Pdb) step > /tmp/foo.py(12)<module>() -> print(bar.BAR) (Pdb) step AttributeError: partially initialized module 'bar' has no attribute 'BAR' (most likely due to a circular import) > /tmp/foo.py(12)<module>() -> print(bar.BAR) (Pdb) step > /tmp/foo.py(13)<module>() -> except AttributeError as e: (Pdb) step > /tmp/foo.py(16)<module>() -> print(e) (Pdb) step partially initialized module 'bar' has no attribute 'BAR' (most likely due to a circular import) --Return-- > /tmp/foo.py(16)<module>()->None -> print(e) (Pdb) step > /tmp/bar.py(2)<module>() -> BAR = 'bar' (Pdb) step > /tmp/bar.py(3)<module>() -> print(foo.FOO) (Pdb) step foo --Return-- > /tmp/bar.py(3)<module>()->None -> print(foo.FOO) (Pdb) step > /tmp/foo.py(10)<module>() -> FOO = 'foo' (Pdb) step > /tmp/foo.py(11)<module>() -> try: (Pdb) step > /tmp/foo.py(12)<module>() -> print(bar.BAR) (Pdb) step bar --Return-- > /tmp/foo.py(12)<module>()->None -> print(bar.BAR) (Pdb) step $ |
|||
msg360714 - (view) | Author: Xavier de Gaye (xdegaye) * | Date: 2020-01-25 23:24 | |
This fixed the bug. To reproduce the bug, substitute the "from _lzma import *" statement with "raise ImportError" in Lib/lzma.py to simulate that the _lzma module is missing and run the test.py script written and uploaded by Maciej. |
|||
msg360793 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2020-01-27 20:08 | |
> When was this change made? It has always been this way, so no change here. We have actually improved the situation over the years with attempts at better locking in importlib itself. |
|||
msg360805 - (view) | Author: Ethan Furman (ethan.furman) * | Date: 2020-01-27 21:29 | |
Cool. I appreciate all the work in this area, thank you! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:25 | admin | set | github: 83611 |
2020-01-27 21:29:15 | ethan.furman | set | messages: + msg360805 |
2020-01-27 20:08:26 | brett.cannon | set | messages: + msg360793 |
2020-01-25 23:24:38 | xdegaye | set | messages: + msg360714 |
2020-01-25 20:21:00 | xdegaye | set | nosy:
+ xdegaye messages: + msg360708 |
2020-01-24 20:25:25 | serhiy.storchaka | set | messages: + msg360653 |
2020-01-24 20:10:56 | miss-islington | set | messages: + msg360650 |
2020-01-24 20:10:50 | miss-islington | set | nosy:
+ miss-islington messages: + msg360649 |
2020-01-24 18:06:10 | ethan.furman | set | messages: + msg360637 |
2020-01-24 17:56:19 | miss-islington | set | pull_requests: + pull_request17557 |
2020-01-24 17:56:12 | ethan.furman | set | messages: + msg360636 |
2020-01-24 17:56:11 | miss-islington | set | pull_requests: + pull_request17556 |
2020-01-24 17:51:45 | ethan.furman | set | messages: + msg360635 |
2020-01-24 17:50:14 | brett.cannon | set | messages: + msg360634 |
2020-01-24 16:50:23 | Maciej Gol | set | messages: + msg360629 |
2020-01-24 14:08:24 | serhiy.storchaka | set | messages: + msg360620 |
2020-01-24 13:34:09 | Maciej Gol | set | messages: + msg360619 |
2020-01-24 13:33:43 | Maciej Gol | set | messages: + msg360618 |
2020-01-24 10:11:36 | xtreak | set | nosy:
+ ethan.furman |
2020-01-24 09:14:45 | serhiy.storchaka | set | nosy:
+ brett.cannon, ncoghlan, lars.gustaebel, eric.snow |
2020-01-24 09:14:16 | serhiy.storchaka | set | versions:
+ Python 3.8, Python 3.9 nosy: + serhiy.storchaka messages: + msg360598 components: + Library (Lib) type: crash -> behavior |
2020-01-24 09:06:33 | serhiy.storchaka | set | keywords:
+ patch stage: patch review pull_requests: + pull_request17548 |
2020-01-23 11:19:33 | Maciej Gol | set | files: - test.py |
2020-01-23 11:19:25 | Maciej Gol | set | files:
+ test.py messages: + msg360552 |
2020-01-23 11:15:38 | Maciej Gol | create |