Issue39069
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-12-16 18:12 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 17629 | closed | pablogsal, 2019-12-16 18:56 |
Messages (18) | |||
---|---|---|---|
msg358496 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-12-16 18:12 | |
Pablo Galingo Salgado recently moved Tools/parser/unparse.py to a ast.unparse() function. Pablo made a change using functools. The additional "import functools" made "import ast" slower and so Pablo reverted his change: * https://github.com/python/cpython/pull/17376 * https://bugs.python.org/issue38870 The question of using contextlib comes back in another ast.unparse change to get @contextlib.contextmanager: * https://github.com/python/cpython/pull/17377#discussion_r350239415 On the same PR, I also proposed to use import enum: * https://github.com/python/cpython/pull/17377#discussion_r357921289 There are different options to not impact "import ast" performance: * Move ast.unparse() code into a new module * Write to code so imports can be done lazily "from ast import unparse" or "ast.unparse()" can be kept using a private _ast_unparse module and add a __getattr__() function to Lib/ast.py to lazily bind _ast_unparse.unparse() to ast.unparse(). Other options: * Avoid cool enum and functools modules, and use simpler Python code (that makes me sad, but it's ok) * Accept to make "import ast" slower * Remove ast.unparse(): I don't think that anyone wants this, ast.unparse() was approved on python-dev. |
|||
msg358497 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-12-16 18:15 | |
> On the same PR, I also proposed to use import enum: "import enum" is not the biggest enum. Creating an enum type is expensive, and so we might to do it lazily. For example, "import re" was made betweeen 20 an 30% slower than re constants were converted to enums. |
|||
msg358498 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-12-16 18:16 | |
Can someone try to run a benchmark to see the actual overhead of adding functools and/or enum imports, and creating a enum type on "import ast"? |
|||
msg358499 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:17 | |
I will be ok with (no particular order): - Avoid cool enum and functools modules, and use simpler - Accept to make "import ast" slower - Lazy import I think is very important to keep 'ast.unparse' symmetric with 'ast.parse': is consistent, helps with discovery and they are very much related. Whatever option we choose we should try to maintain this IMO. |
|||
msg358501 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-12-16 18:21 | |
> Lazy import IMHO moving unparse code to a new _ast_unparse module, and use ast.__getattr__() to lazily import it and bind it as the ast.unparse() function is the best option, since __getattr__() alone is enough to implement the laziness and it should be simple, something like: def __getattr__(name): if name == 'unparse': import _ast_unparse globals()['unpase'] = _ast_unparse.unparse return _ast_unparse.unparse else: raise AttributeError I never used a module __getattr__(). So _ast_unparse could use any super slow but cool Python feature, without having to use dirty hacks to make the code lazy. |
|||
msg358505 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:27 | |
Ok, I think I agree, will make a PR for that. |
|||
msg358506 - (view) | Author: Batuhan Taskaya (BTaskaya) * | Date: 2019-12-16 18:29 | |
$ ./python -m pyperf timeit "import ast" Before: Mean +- std dev: 326 ns +- 13 ns After: Mean +- std dev: 330 ns +- 19 ns (applied my first patch with both contextlib and IntEnum) Pablo Galindo Salgado <report@bugs.python.org>, 16 Ara 2019 Pzt, 21:27 tarihinde şunu yazdı: > > Pablo Galindo Salgado <pablogsal@gmail.com> added the comment: > > Ok, I think I agree, will make a PR for that. > > ---------- > assignee: -> pablogsal > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue39069> > _______________________________________ > |
|||
msg358507 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:35 | |
That will likely be very dependent on the filesystem. For example, in one of my servers without SSDs: Before: Mean +- std dev: 412 ns +- 11 ns After: Mean +- std dev: 472 ns +- 11 ns |
|||
msg358509 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:37 | |
Also, make sure that the module is not cached, for example: ./python -m pyperf timeit 'import sys; sys.modules.pop("ast",None);sys.modules.pop("enum",None)' 'import ast' |
|||
msg358512 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:47 | |
One thing to notice: Factoring this into a submodule may be a bit annoying as it will have circular imports as the unparse submodule depends directly on stuff from ast and ast will import unparse. Is possible to make it work if we import it at the end but is a bit ugly. |
|||
msg358513 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 18:57 | |
Opened https://github.com/python/cpython/pull/17629 in case we decide to go this route. |
|||
msg358517 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2019-12-16 20:00 | |
Why is the import speed of ast important enough to uglify this code? Who is using ast.py that isn't also importing lots of other stuff? For example, black imports lots of other stuff (multiprocessing, asyncio, pathlib, typing, ..., even 3rd party attr, click and toml). |
|||
msg358518 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 20:16 | |
If I remember correctly, the people that were concerned mentioned the usage of 'ast.literal_eval' in some simple command line applications were the import may be noticeable but I will be totally ok to just accept the slower import of ast. |
|||
msg358519 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2019-12-16 20:50 | |
If anything, ast.literal_eval() should be moved. |
|||
msg358520 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 20:57 | |
Victor, are you OK if we close this issue and just use the desired imports directly in the PRs for issue 38870? |
|||
msg358521 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-12-16 21:31 | |
Pablo: > Victor, are you OK if we close this issue and just use the desired imports directly in the PRs for issue 38870? Yes. -- *I* don't care of "import ast" performance, since I don't expect it to be commonly used by command line applications. I don't know the story of the revert, but it seems like you don't bother much anymore. If tomorrow using "functools" and "enum" becomes a performance bottleneck (for import time), maybe we should try to optimize imports and optimize these modules instead? I opened this issue because I would like to use these modules in ast. I'm unhappy with the current proposed implementation: --- class _NoDelimit: def __enter__(self): pass def __exit__(self, *args): pass class _Delimit: """A context manager for preparing the source for expressions. It adds start of the delimiter and enters, after exit it adds delimiter end.""" def __init__(self, unparser, delimiter): self.unparser = unparser self.delimiter = delimiter def __enter__(self): self.unparser.write(self.delimiter[0]) def __exit__(self, exc_type, exc_value, traceback): self.unparser.write(self.delimiter[-1]) def delimit_if(self, condition, delimiter): if condition: return self._Delimit(self, delimiter) else: return self._NoDelimit() --- Having to define two classes just to call the write() method seems overkill to me. Same remark for the pseudo enum in PR 17377: --- PRECEDENCE_LEVELS = { "PR_TUPLE": 0, "PR_YIELD": 1, "PR_TEST": 2, "PR_OR": 3, "PR_AND": 4, "PR_NOT": 5, "PR_CMP": 6, "PR_EXPR": 7, "PR_BOR": 7, "PR_BXOR": 8, "PR_BAND": 9, "PR_SHIFT": 10, "PR_ARITH": 11, "PR_TERM": 12, "PR_FACTOR": 13, "PR_POWER": 14, "PR_AWAIT": 15, "PR_ATOM": 16, } --- It's too easy to introduce a duplicated constant with such construction. The enum module is designed to define unique constants. |
|||
msg358522 - (view) | Author: Batuhan Taskaya (BTaskaya) * | Date: 2019-12-16 21:36 | |
Thanks for having consensus on this. I'll refactor PR 17612 and open anothor one for reverting PR 17376 On Tue, Dec 17, 2019, 12:31 AM STINNER Victor <report@bugs.python.org> wrote: > > STINNER Victor <vstinner@python.org> added the comment: > > Pablo: > > Victor, are you OK if we close this issue and just use the desired > imports directly in the PRs for issue 38870? > > Yes. > > -- > > *I* don't care of "import ast" performance, since I don't expect it to be > commonly used by command line applications. I don't know the story of the > revert, but it seems like you don't bother much anymore. > > If tomorrow using "functools" and "enum" becomes a performance bottleneck > (for import time), maybe we should try to optimize imports and optimize > these modules instead? > > I opened this issue because I would like to use these modules in ast. I'm > unhappy with the current proposed implementation: > --- > > class _NoDelimit: > def __enter__(self): pass > def __exit__(self, *args): pass > > class _Delimit: > """A context manager for preparing the source for expressions. It > adds > start of the delimiter and enters, after exit it adds delimiter > end.""" > def __init__(self, unparser, delimiter): > self.unparser = unparser > self.delimiter = delimiter > def __enter__(self): > self.unparser.write(self.delimiter[0]) > def __exit__(self, exc_type, exc_value, traceback): > self.unparser.write(self.delimiter[-1]) > > def delimit_if(self, condition, delimiter): > if condition: > return self._Delimit(self, delimiter) > else: > return self._NoDelimit() > --- > > Having to define two classes just to call the write() method seems > overkill to me. > > Same remark for the pseudo enum in PR 17377: > --- > PRECEDENCE_LEVELS = { > "PR_TUPLE": 0, > "PR_YIELD": 1, > "PR_TEST": 2, > "PR_OR": 3, > "PR_AND": 4, > "PR_NOT": 5, > "PR_CMP": 6, > "PR_EXPR": 7, > "PR_BOR": 7, > "PR_BXOR": 8, > "PR_BAND": 9, > "PR_SHIFT": 10, > "PR_ARITH": 11, > "PR_TERM": 12, > "PR_FACTOR": 13, > "PR_POWER": 14, > "PR_AWAIT": 15, > "PR_ATOM": 16, > } > --- > > It's too easy to introduce a duplicated constant with such construction. > The enum module is designed to define unique constants. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue39069> > _______________________________________ > |
|||
msg358523 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-12-16 21:58 | |
Closing this now that we have consensus. :) Thanks, everyone for your input! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:24 | admin | set | github: 83250 |
2019-12-16 21:58:21 | pablogsal | set | status: open -> closed resolution: fixed messages: + msg358523 stage: resolved |
2019-12-16 21:36:22 | BTaskaya | set | messages: + msg358522 |
2019-12-16 21:31:24 | vstinner | set | messages: + msg358521 |
2019-12-16 20:57:55 | pablogsal | set | messages: + msg358520 |
2019-12-16 20:50:23 | gvanrossum | set | messages: + msg358519 |
2019-12-16 20:16:20 | pablogsal | set | messages: + msg358518 |
2019-12-16 20:00:32 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg358517 |
2019-12-16 18:57:21 | pablogsal | set | messages:
+ msg358513 stage: patch review -> (no value) |
2019-12-16 18:56:56 | pablogsal | set | keywords:
+ patch stage: patch review pull_requests: + pull_request17098 |
2019-12-16 18:47:14 | pablogsal | set | messages: + msg358512 |
2019-12-16 18:37:07 | pablogsal | set | messages: + msg358509 |
2019-12-16 18:35:24 | pablogsal | set | messages: + msg358507 |
2019-12-16 18:29:37 | BTaskaya | set | messages: + msg358506 |
2019-12-16 18:27:29 | pablogsal | set | assignee: pablogsal messages: + msg358505 |
2019-12-16 18:26:20 | pablogsal | set | nosy:
+ BTaskaya |
2019-12-16 18:21:37 | vstinner | set | messages: + msg358501 |
2019-12-16 18:17:01 | pablogsal | set | nosy:
- BTaskaya messages: + msg358499 |
2019-12-16 18:16:37 | vstinner | set | nosy:
+ BTaskaya |
2019-12-16 18:16:04 | vstinner | set | messages: + msg358498 |
2019-12-16 18:15:01 | vstinner | set | messages: + msg358497 |
2019-12-16 18:12:54 | vstinner | create |