classification
Title: Move ast.unparse() function to a different module
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pablogsal Nosy List: BTaskaya, gvanrossum, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2019-12-16 18:12 by vstinner, last changed 2019-12-16 21:58 by pablogsal. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 17629 closed pablogsal, 2019-12-16 18:56
Messages (18)
msg358496 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-16 18:12
Pablo Galingo Salgado recently moved Tools/parser/unparse.py to a ast.unparse() function.

Pablo made a change using functools. The additional "import functools" made "import ast" slower and so Pablo reverted his change:

* https://github.com/python/cpython/pull/17376
* https://bugs.python.org/issue38870

The question of using contextlib comes back in another ast.unparse change to get @contextlib.contextmanager:

* https://github.com/python/cpython/pull/17377#discussion_r350239415

On the same PR, I also proposed to use import enum:

* https://github.com/python/cpython/pull/17377#discussion_r357921289

There are different options to not impact "import ast" performance:

* Move ast.unparse() code into a new module
* Write to code so imports can be done lazily

"from ast import unparse" or "ast.unparse()" can be kept using a private _ast_unparse module and add a __getattr__() function to Lib/ast.py to lazily bind _ast_unparse.unparse() to ast.unparse().

Other options:

* Avoid cool enum and functools modules, and use simpler Python code (that makes me sad, but it's ok)
* Accept to make "import ast" slower
* Remove ast.unparse(): I don't think that anyone wants this, ast.unparse() was approved on python-dev.
msg358497 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-16 18:15
> On the same PR, I also proposed to use import enum:

"import enum" is not the biggest enum. Creating an enum type is expensive, and so we might to do it lazily. For example, "import re" was made betweeen 20 an 30% slower than re constants were converted to enums.
msg358498 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-16 18:16
Can someone try to run a benchmark to see the actual overhead of adding functools and/or enum imports, and creating a enum type on "import ast"?
msg358499 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:17
I will be ok with (no particular order):

- Avoid cool enum and functools modules, and use simpler
- Accept to make "import ast" slower
- Lazy import

I think is very important to keep 'ast.unparse' symmetric with 'ast.parse': is consistent, helps with discovery and they are very much related. Whatever option we choose we should try to maintain this IMO.
msg358501 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-16 18:21
> Lazy import

IMHO moving unparse code to a new _ast_unparse module, and use ast.__getattr__() to lazily import it and bind it as the ast.unparse() function is the best option, since __getattr__() alone is enough to implement the laziness and it should be simple, something like:

def __getattr__(name):
  if name == 'unparse':
    import _ast_unparse
    globals()['unpase'] = _ast_unparse.unparse
    return _ast_unparse.unparse
  else:
    raise AttributeError

I never used a module __getattr__().

So _ast_unparse could use any super slow but cool Python feature, without having to use dirty hacks to make the code lazy.
msg358505 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:27
Ok, I think I agree, will make a PR for that.
msg358506 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2019-12-16 18:29
$ ./python -m pyperf timeit "import ast"
Before: Mean +- std dev: 326 ns +- 13 ns
After: Mean +- std dev: 330 ns +- 19 ns

(applied my first patch with both contextlib and IntEnum)

Pablo Galindo Salgado <report@bugs.python.org>, 16 Ara 2019 Pzt, 21:27
tarihinde şunu yazdı:

>
> Pablo Galindo Salgado <pablogsal@gmail.com> added the comment:
>
> Ok, I think I agree, will make a PR for that.
>
> ----------
> assignee:  -> pablogsal
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue39069>
> _______________________________________
>
msg358507 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:35
That will likely be very dependent on the filesystem. For example, in one of my servers without SSDs:

Before: Mean +- std dev: 412 ns +- 11 ns
After: Mean +- std dev: 472 ns +- 11 ns
msg358509 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:37
Also, make sure that the module is not cached, for example:

./python -m pyperf timeit 'import sys; sys.modules.pop("ast",None);sys.modules.pop("enum",None)' 'import ast'
msg358512 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:47
One thing to notice:

Factoring this into a submodule may be a bit annoying as it will have circular imports as the unparse submodule depends directly on stuff from ast and ast will import unparse. Is possible to make it work if we import it at the end but is a bit ugly.
msg358513 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 18:57
Opened https://github.com/python/cpython/pull/17629 in case we decide to go this route.
msg358517 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-16 20:00
Why is the import speed of ast important enough to uglify this code? Who is using ast.py that isn't also importing lots of other stuff? For example, black imports lots of other stuff (multiprocessing, asyncio, pathlib, typing, ..., even 3rd party attr, click and toml).
msg358518 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 20:16
If I remember correctly, the people that were concerned mentioned the usage of 'ast.literal_eval' in some simple command line applications were the import may be noticeable but I will be totally ok to just accept the slower import of ast.
msg358519 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-12-16 20:50
If anything, ast.literal_eval() should be moved.
msg358520 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 20:57
Victor, are you OK if we close this issue and just use the desired imports directly in the PRs for issue 38870?
msg358521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-16 21:31
Pablo:
> Victor, are you OK if we close this issue and just use the desired imports directly in the PRs for issue 38870?

Yes.

--

*I* don't care of "import ast" performance, since I don't expect it to be commonly used by command line applications. I don't know the story of the revert, but it seems like you don't bother much anymore.

If tomorrow using "functools" and "enum" becomes a performance bottleneck (for import time), maybe we should try to optimize imports and optimize these modules instead?

I opened this issue because I would like to use these modules in ast. I'm unhappy with the current proposed implementation:
---

    class _NoDelimit:
        def __enter__(self): pass
        def __exit__(self, *args): pass

    class _Delimit:
        """A context manager for preparing the source for expressions. It adds
        start of the delimiter  and enters, after exit it adds delimiter end."""
        def __init__(self, unparser, delimiter):
            self.unparser = unparser
            self.delimiter = delimiter
        def __enter__(self):
            self.unparser.write(self.delimiter[0])
        def __exit__(self, exc_type, exc_value, traceback):
            self.unparser.write(self.delimiter[-1])

    def delimit_if(self, condition, delimiter):
        if condition:
            return self._Delimit(self, delimiter)
        else:
            return self._NoDelimit()
---

Having to define two classes just to call the write() method seems overkill to me.

Same remark for the pseudo enum in PR 17377:
---
    PRECEDENCE_LEVELS = {
        "PR_TUPLE": 0,
        "PR_YIELD": 1,
        "PR_TEST": 2,
        "PR_OR": 3,
        "PR_AND": 4,
        "PR_NOT": 5,
        "PR_CMP": 6,
        "PR_EXPR": 7,
        "PR_BOR": 7,
        "PR_BXOR": 8,
        "PR_BAND": 9,
        "PR_SHIFT": 10,
        "PR_ARITH": 11,
        "PR_TERM": 12,
        "PR_FACTOR": 13,
        "PR_POWER": 14,
        "PR_AWAIT": 15,
        "PR_ATOM": 16,
    }
---

It's too easy to introduce a duplicated constant with such construction. The enum module is designed to define unique constants.
msg358522 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2019-12-16 21:36
Thanks for having consensus on this. I'll refactor PR 17612 and open
anothor one for reverting PR 17376

On Tue, Dec 17, 2019, 12:31 AM STINNER Victor <report@bugs.python.org>
wrote:

>
> STINNER Victor <vstinner@python.org> added the comment:
>
> Pablo:
> > Victor, are you OK if we close this issue and just use the desired
> imports directly in the PRs for issue 38870?
>
> Yes.
>
> --
>
> *I* don't care of "import ast" performance, since I don't expect it to be
> commonly used by command line applications. I don't know the story of the
> revert, but it seems like you don't bother much anymore.
>
> If tomorrow using "functools" and "enum" becomes a performance bottleneck
> (for import time), maybe we should try to optimize imports and optimize
> these modules instead?
>
> I opened this issue because I would like to use these modules in ast. I'm
> unhappy with the current proposed implementation:
> ---
>
>     class _NoDelimit:
>         def __enter__(self): pass
>         def __exit__(self, *args): pass
>
>     class _Delimit:
>         """A context manager for preparing the source for expressions. It
> adds
>         start of the delimiter  and enters, after exit it adds delimiter
> end."""
>         def __init__(self, unparser, delimiter):
>             self.unparser = unparser
>             self.delimiter = delimiter
>         def __enter__(self):
>             self.unparser.write(self.delimiter[0])
>         def __exit__(self, exc_type, exc_value, traceback):
>             self.unparser.write(self.delimiter[-1])
>
>     def delimit_if(self, condition, delimiter):
>         if condition:
>             return self._Delimit(self, delimiter)
>         else:
>             return self._NoDelimit()
> ---
>
> Having to define two classes just to call the write() method seems
> overkill to me.
>
> Same remark for the pseudo enum in PR 17377:
> ---
>     PRECEDENCE_LEVELS = {
>         "PR_TUPLE": 0,
>         "PR_YIELD": 1,
>         "PR_TEST": 2,
>         "PR_OR": 3,
>         "PR_AND": 4,
>         "PR_NOT": 5,
>         "PR_CMP": 6,
>         "PR_EXPR": 7,
>         "PR_BOR": 7,
>         "PR_BXOR": 8,
>         "PR_BAND": 9,
>         "PR_SHIFT": 10,
>         "PR_ARITH": 11,
>         "PR_TERM": 12,
>         "PR_FACTOR": 13,
>         "PR_POWER": 14,
>         "PR_AWAIT": 15,
>         "PR_ATOM": 16,
>     }
> ---
>
> It's too easy to introduce a duplicated constant with such construction.
> The enum module is designed to define unique constants.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue39069>
> _______________________________________
>
msg358523 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-16 21:58
Closing this now that we have consensus. :)

Thanks, everyone for your input!
History
Date User Action Args
2019-12-16 21:58:21pablogsalsetstatus: open -> closed
resolution: fixed
messages: + msg358523

stage: resolved
2019-12-16 21:36:22BTaskayasetmessages: + msg358522
2019-12-16 21:31:24vstinnersetmessages: + msg358521
2019-12-16 20:57:55pablogsalsetmessages: + msg358520
2019-12-16 20:50:23gvanrossumsetmessages: + msg358519
2019-12-16 20:16:20pablogsalsetmessages: + msg358518
2019-12-16 20:00:32gvanrossumsetnosy: + gvanrossum
messages: + msg358517
2019-12-16 18:57:21pablogsalsetmessages: + msg358513
stage: patch review -> (no value)
2019-12-16 18:56:56pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17098
2019-12-16 18:47:14pablogsalsetmessages: + msg358512
2019-12-16 18:37:07pablogsalsetmessages: + msg358509
2019-12-16 18:35:24pablogsalsetmessages: + msg358507
2019-12-16 18:29:37BTaskayasetmessages: + msg358506
2019-12-16 18:27:29pablogsalsetassignee: pablogsal
messages: + msg358505
2019-12-16 18:26:20pablogsalsetnosy: + BTaskaya
2019-12-16 18:21:37vstinnersetmessages: + msg358501
2019-12-16 18:17:01pablogsalsetnosy: - BTaskaya
messages: + msg358499
2019-12-16 18:16:37vstinnersetnosy: + BTaskaya
2019-12-16 18:16:04vstinnersetmessages: + msg358498
2019-12-16 18:15:01vstinnersetmessages: + msg358497
2019-12-16 18:12:54vstinnercreate