msg375881 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-08-25 10:16 |
Building Mercurial with Python 3.9.0rc1 fails with the error:
SystemError: <built-in function compile> returned NULL without setting an error
The problem comes from the PyAST_Check() function. This function calls get_global_ast_state() which gets the state of the _ast module. If the module is not imported yet, it is imported.
The problem is that Mercurial monkey-patches the __import__() builtin function to implement lazy imports. get_global_ast_state() calls PyImport_Import() which returns a Mercurial _LazyModule(__name__='_ast', ...) object. Calling get_ast_state() (PyModule_GetState()) on it is unsafe since it's not the _ast extension module, but another module (which has no state, PyModule_GetState() returns NULL).
https://bugzilla.redhat.com/show_bug.cgi?id=1871992#c1
--
The _ast extension module was modified multiple times recently:
* September 2019: The extension was converted to PEP 384 (stable ABI): bpo-38113, commit ac46eb4ad6662cf6d771b20d8963658b2186c48c
* July 2020: The extension was converted to PEP 489 (multiphase init): bpo-41194, commit b1cc6ba73a51d5cc3aeb113b5e7378fb50a0e20a
* (and bugfixes: see bpo-41194 and bpo-41204)
I did the PEP 489 change to fix a regression caused by the first change (PEP 384), two bugs in fact:
* bpo-41194 Python 3.9.0b3 crash on compile() in PyAST_Check() when the _ast module is loaded more than once
* bpo-41261: 3.9-dev SEGV in object_recursive_isinstance in ast.literal_eval
|
msg375883 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-08-25 10:19 |
One option is to revert all AST changes of bpo-38113 (and following changes), to move back to the state before bpo-38113, until all issues are addressed.
|
msg375884 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-25 10:29 |
Given how close are we to a release for 3.9 we should try to maximize stability.
|
msg375948 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2020-08-26 18:29 |
Regarding ac46eb4ad6662cf6d771b20d8963658b2186c48c:
Module states come and go with the modules that contain them; if a "get_global_ast_state" or "astmodulestate_global" needs to be accessed from outside the module, it shouldn't be module state :/
----
So, the main issue here is that the AST types are not used only by the _ast module, but by the interpreter itself: the compile() builtin and Py_CompileStringObject.
I see two ways of fixing this properly:
1. All the classes _ast provides should be built-in, like, say, `function`. (Currently, that means they should be static types; later they could be per-interpreter.)
The _ast module should merely expose them from Python, like the `types` module exposes the function type.
This would mean that calling Py_CompileStringObject with PyCF_ONLY_AST will be independent of the _ast module.
2. The mod2obj/obj2mod functions, called by e.g. compile(..., PyCF_ONLY_AST), should:
* import the _ast module
* call a Python-accessible function, e.g. _ast._mod2obj
This would mean replacing the _ast module (in sys.modules or through an import hook, which you can do from Python code) will affect what AST types will be used throughout the interpreter.
|
msg375950 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-26 18:38 |
>The mod2obj/obj2mod functions, called by e.g. compile(..., PyCF_ONLY_AST), should:
* import the _ast module
* call a Python-accessible function, e.g. _ast._mod2obj
Would that impact performance considerably?
|
msg375951 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-26 18:41 |
Also, option 1 is virtually equivalent to the state of the _ast module prior to the recent changes except that the symbols are in a shared object instead of the binary or libpython. The advantage here is not moving them out of the shared object, is making them having static storage.
|
msg375952 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-26 18:43 |
Also, adding them into a module that needs access through Python had a bootstrap problem:
Basically:
initializing import system -> initialize codec -> compile -> ast init -> init ast module -> 💥
|
msg375955 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2020-08-26 18:57 |
> Also, option 1 is virtually equivalent to the state of the _ast module prior to the recent changes except that the symbols are in a shared object instead of the binary or libpython. The advantage here is not moving them out of the shared object, is making them having static storage.
What I'm not yet clear on: when is that shared object is initialized and destroyed?
> Would that impact performance considerably?
> Also, adding them into a module that needs access through Python had a bootstrap problem
Only for PyCF_ONLY_AST, which is, AFAIK, not used by the interpreter itself.
|
msg375956 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-26 19:06 |
> What I'm not yet clear on: when is that shared object is initialized and destroyed?
I am assuming that you mean at the Python level and not at the linker level. Then:
* Initialized: when dlopen on import.
* Destroyed: never. The interpreter does not dlclose the shared objects.
|
msg375957 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2020-08-26 19:08 |
> Initialized: when dlopen on import.
With this I mean import-> dlopen -> dlsym for init function -> call init function
|
msg376256 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-02 16:46 |
At commit ac46eb4ad6662cf6d771b20d8963658b2186c48c ("bpo-38113: Update the Python-ast.c generator to PEP384 (gh-15957)"):
---------------
static struct PyModuleDef _astmodule = {
PyModuleDef_HEAD_INIT,
"_ast",
NULL,
sizeof(astmodulestate),
NULL,
NULL,
astmodule_traverse,
astmodule_clear,
astmodule_free,
};
#define astmodulestate_global ((astmodulestate *)PyModule_GetState(PyState_FindModule(&_astmodule)))
PyMODINIT_FUNC
PyInit__ast(void)
{
PyObject *m;
if (!init_types()) return NULL;
m = PyState_FindModule(&_astmodule);
...
return m;
}
---------------
=> I investigated bpo-41194 crash again. It seems like the root issue is more than PyInit__ast() returns a *borrowed* reference, rather than a strong reference.
=> The _ast module can only be loaded once, astmodule_free() and astmodule_clear() are not called at Python exit.
At commit 91e1bc18bd467a13bceb62e16fbc435b33381c82 ("bpo-41194: The _ast module cannot be loaded more than once (GH-21290)"):
---------------
static astmodulestate global_ast_state;
static astmodulestate *
get_ast_state(PyObject *Py_UNUSED(module))
{
return &global_ast_state;
}
#define get_global_ast_state() (&global_ast_state)
static struct PyModuleDef _astmodule = {
PyModuleDef_HEAD_INIT,
.m_name = "_ast",
.m_size = -1,
.m_traverse = astmodule_traverse,
.m_clear = astmodule_clear,
.m_free = astmodule_free,
};
PyMODINIT_FUNC
PyInit__ast(void)
{
PyObject *m = PyModule_Create(&_astmodule);
if (!m) {
return NULL;
}
astmodulestate *state = get_ast_state(m);
...
}
---------------
* bpo-41194 => ok, PyInit__ast() is fine
* bpo-41631 => ok: PyAST_Check() doesn't import the module, it uses a global state.
* bpo-41261 => ERROR: This implementation allows to create two instances of the _ast module, but also to unload one instance (call astmodule_free), because _astmodule.m_size = -1.
At commit b1cc6ba73a51d5cc3aeb113b5e7378fb50a0e20a ("bpo-41194: Convert _ast extension to PEP 489 (GH-21293)"):
---------------
static astmodulestate*
get_global_ast_state(void)
{
...
PyObject *module = PyImport_GetModule(name);
if (module == NULL) {
if (PyErr_Occurred()) {
return NULL;
}
module = PyImport_Import(name);
if (module == NULL) {
return NULL;
}
}
...
astmodulestate *state = get_ast_state(module);
Py_DECREF(module);
return state;
}
static struct PyModuleDef _astmodule = {
PyModuleDef_HEAD_INIT,
.m_name = "_ast",
.m_size = sizeof(astmodulestate),
.m_slots = astmodule_slots,
.m_traverse = astmodule_traverse,
.m_clear = astmodule_clear,
.m_free = astmodule_free,
};
PyMODINIT_FUNC
PyInit__ast(void)
{
return PyModuleDef_Init(&_astmodule);
}
---------------
* bpo-41631 => ERROR: PyAST_Check() imports "_ast" module
|
msg376564 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2020-09-08 11:36 |
I also looked into Mercurial. They have a large list of modules [0] that don't work with the lazy-loading scheme. I don't think adding one more should be a big problem for them; but we'll reach out to them and confirm that.
[0] https://www.mercurial-scm.org/repo/hg/file/tip/hgdemandimport/__init__.py#l26
|
msg376565 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2020-09-08 11:37 |
We need this bug solved for 3.9.0 rc2. Łukasz, you're the one to make the call about the approach; how can we make your job easier?
My view is:
- Victor's PR 21961 is going in the right direction. But, as far as I know (correct me if I'm wrong), Victor unfortunately has less time for CPython than usual, for personal reasons. So, the PR is unexpectedly stalled. I don't think there's enough time to fix and test it for RC 2.
- My PR 21973 essentially just provides a better error message (and might prevent segfaults in more convoluted cases -- essentially ones where the _ast module is replaced maliciously). We can merge it, backport to 3.9; then iterate on Victor's PR to have a better solution in 3.10.
But Pablo an Victor should chime in as well.
|
msg376573 - (view) |
Author: Manuel Jacob (mjacob) * |
Date: 2020-09-08 14:05 |
I couldn’t reproduce the problem with a freshly compiled Python 3.9.0rc1 on Arch Linux. Was this ever reproduced on a non-Red Hat system?
|
msg376574 - (view) |
Author: Petr Viktorin (petr.viktorin) * |
Date: 2020-09-08 14:17 |
That is interesting.
The original post here doesn't mention that the problem occurs in Mercurial's "make all", specifically when building documentation, in the command:
/usr/bin/python3.9 gendoc.py "hgrc.5" > hgrc.5.txt.tmp
Could you confirm `make all` works OK with python3.9 on Arch?
|
msg376575 - (view) |
Author: Manuel Jacob (mjacob) * |
Date: 2020-09-08 14:29 |
I was running "make all" and I also ran the documentation generator command without an error.
However, I tried it again and now it failed the same way as reported. With a debug build, I get "Python/Python-ast.c:231: get_ast_state: Assertion `state != NULL' failed.".
Sorry for the noise!
|
msg376686 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-10 13:55 |
About subinterpreters. In Python 3.8, _ast.AST type is a static type:
static PyTypeObject AST_type = {...};
In Python 3.9, it's now a heap type:
static PyType_Spec AST_type_spec = {...};
state->AST_type = PyType_FromSpec(&AST_type_spec);
In Python 3.8, the same _ast.AST type was shared by all interpreters. With my PR 21961, the _ast.AST type is a heap type but it is also shared by all interpreters.
Compared to Python 3.8, PR 21961 has no regression related to subinterpreters. It's not better nor worse. The minor benefit is that _ast.AST is cleared at Python exit (see _PyAST_Fini()).
While I would be nice to make _ast.AST per interpreter, it would be an *enhancement*. I consider that it can wait for Python 3.10.
|
msg376708 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-11 09:30 |
> Compared to Python 3.8, PR 21961 has no regression related to subinterpreters.
Oh. I forgot that static types cannot be modified (in Python, but it's possible in C). So my PR still changed the behavior compared to 3.8:
---
import _testcapi
import _ast
res = _testcapi.run_in_subinterp("import _ast; _ast.AST.x = 1")
if res != 0:
raise Exception("bug")
print(_ast.AST.x)
---
On Python 3.8, this code snippet fails with:
TypeError: can't set attributes of built-in/extension type '_ast.AST'
On master, it is possible to modify or add an _ast.AST attribute. Since PR 21961 moves back to a global strange, Petr is correct that a subinterpreter can now modify the state of another subinterpreter the _ast.AST type.
So I modified my PR 21961 to revert partially the change which converted AST_type type from a static type to a heap type.
My PR 21961 converts AST_type back to a static type, to avoid these problems. Sadly, it makes _ast module incompatible with PEP 384, but the priority is to fix this 3rd regression. We can reconsider converting PEP 384 back to a heap type later, but we will have to be careful with not reintroducing all these bugs.
|
msg376874 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-14 10:01 |
I marked bpo-41766 as a duplicate of this issue.
|
msg376946 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2020-09-15 16:03 |
New changeset e5fbe0cbd4be99ced5f000ad382208ad2a561c90 by Victor Stinner in branch 'master':
bpo-41631: _ast module uses again a global state (#21961)
https://github.com/python/cpython/commit/e5fbe0cbd4be99ced5f000ad382208ad2a561c90
|
msg376950 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2020-09-15 18:33 |
New changeset 55e0836849c14fb474e1ba7f37851e07660eea3c by Pablo Galindo in branch '3.9':
[3.9] bpo-41631: _ast module uses again a global state (GH-21961) (GH-22258)
https://github.com/python/cpython/commit/55e0836849c14fb474e1ba7f37851e07660eea3c
|
msg376955 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-15 21:32 |
I close the issue. I tested manually: the fix works as expected.
I reproduced https://bugzilla.redhat.com/show_bug.cgi?id=1871992#c1 bug (I had to comment the workaround of the mercurial package in Fedora) on 3.9:
make[1]: Entering directory '/home/vstinner/mercurial/mercurial-5.4/doc'
/home/vstinner/python/3.9/python gendoc.py "hgrc.5" > hgrc.5.txt.tmp
python: Python/Python-ast.c:231: get_ast_state: Assertion `state != NULL' failed.
I confirm that the fix in 3.9 works as expected:
> New changeset 55e0836849c14fb474e1ba7f37851e07660eea3c by Pablo Galindo in branch '3.9'
=> No crash with this fix.
I also reproduced the bug in the master branch (without the fix), and I also confirm that the fix in master does fix the crash.
|
msg376956 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2020-09-15 21:47 |
I also tested bpo-41261 reproducer and https://bugs.python.org/issue41194#msg372863 reproducer on 3.9 and master branches: Python doesn't crash.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:35 | admin | set | github: 85797 |
2020-10-29 11:52:17 | vstinner | set | pull_requests:
+ pull_request21941 |
2020-09-20 22:15:30 | pablogsal | set | pull_requests:
+ pull_request21375 |
2020-09-15 21:47:57 | vstinner | set | messages:
+ msg376956 |
2020-09-15 21:32:12 | vstinner | set | status: open -> closed priority: release blocker -> messages:
+ msg376955
resolution: fixed stage: patch review -> resolved |
2020-09-15 18:33:06 | lukasz.langa | set | messages:
+ msg376950 |
2020-09-15 18:07:55 | pablogsal | set | pull_requests:
+ pull_request21313 |
2020-09-15 16:03:40 | lukasz.langa | set | messages:
+ msg376946 |
2020-09-14 10:01:53 | vstinner | set | messages:
+ msg376874 |
2020-09-14 10:01:49 | vstinner | link | issue41766 superseder |
2020-09-11 09:30:03 | vstinner | set | messages:
+ msg376708 |
2020-09-10 13:55:18 | vstinner | set | messages:
+ msg376686 |
2020-09-08 14:29:43 | mjacob | set | messages:
+ msg376575 |
2020-09-08 14:17:48 | petr.viktorin | set | messages:
+ msg376574 |
2020-09-08 14:05:58 | mjacob | set | nosy:
+ mjacob messages:
+ msg376573
|
2020-09-08 11:37:06 | petr.viktorin | set | messages:
+ msg376565 |
2020-09-08 11:36:42 | petr.viktorin | set | messages:
+ msg376564 |
2020-09-02 16:46:35 | vstinner | set | messages:
+ msg376256 |
2020-08-26 23:58:02 | petr.viktorin | set | pull_requests:
+ pull_request21082 |
2020-08-26 19:08:16 | pablogsal | set | messages:
+ msg375957 |
2020-08-26 19:06:46 | pablogsal | set | messages:
+ msg375956 |
2020-08-26 18:57:35 | petr.viktorin | set | messages:
+ msg375955 |
2020-08-26 18:43:03 | pablogsal | set | messages:
+ msg375952 |
2020-08-26 18:41:44 | pablogsal | set | messages:
+ msg375951 |
2020-08-26 18:38:17 | pablogsal | set | messages:
+ msg375950 |
2020-08-26 18:29:02 | petr.viktorin | set | nosy:
+ eric.snow, petr.viktorin, dino.viehland messages:
+ msg375948
|
2020-08-26 09:26:17 | vstinner | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request21071 |
2020-08-26 07:53:51 | xtreak | set | nosy:
+ lukasz.langa
|
2020-08-25 16:23:35 | shihai1991 | set | nosy:
+ shihai1991
|
2020-08-25 10:29:02 | pablogsal | set | messages:
+ msg375884 |
2020-08-25 10:28:28 | pablogsal | set | nosy:
+ pablogsal
|
2020-08-25 10:19:56 | vstinner | set | messages:
+ msg375883 |
2020-08-25 10:17:49 | BTaskaya | set | nosy:
+ BTaskaya
|
2020-08-25 10:17:37 | vstinner | set | nosy:
+ corona10
|
2020-08-25 10:16:01 | vstinner | create | |