This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Reimplement the keyword module in C
Type: performance Stage: patch review
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, christian.heimes, georg.brandl, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2013-10-12 00:33 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
keyword.patch vstinner, 2013-10-12 00:33 review
keyword_grammar.patch christian.heimes, 2013-10-12 11:33 review
Messages (17)
msg199528 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-12 00:33
To speedup Python startup, I propose to reimplement the keyword module in C. Attached patch implements it.

* Remove Lib/keyword.py
* Add Tools/scripts/keyword.py: script to build Modules/keyword.h
* The new module is Modules/keyword.c

I chose to change the keyword.kwlist type from list to tuple. The documentation only says they that kwlist is a "Sequence":
http://docs.python.org/dev/library/keyword.html#keyword.kwlist

The keyword is now used by the collections module in the implementation of namedtuple.

The keyword module is trivial: it only contains a sequence of strings and a method checking if a string is part of this sequence. It would be simpler to add this to an existing module. I suppose that a new module was added for technical reasons.
msg199530 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-12 01:25
I fail to see a difference with my patch in the startup time, but I see that the number of syscalls is reduced by 10 (558 => 548).
msg199531 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-12 01:29
The patch adds 448 bytes to the stripped python executable file.
msg199542 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-12 09:20
IMO this should be rejected. Failure to improve startup time + more complicated maintenance.
msg199547 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-12 11:33
Here is a simpler patch that directly uses the grammar definition to create a list of keywords. It completely removes the necessity of a script.
msg199548 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-12 11:37
-1 again. We shouldn't gratuitously convert Python code to C code.
msg199549 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-10-12 11:43
Well, combined with the fact that it gets rid of a manual regeneration step (that is easy to forget, since adding a keyword is not done very often) I think it's a net gain.

The same could be done with the "token" module BTW.
msg199550 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-12 11:43
> Well, combined with the fact that it gets rid of a manual regeneration step (that is easy to forget, since adding a keyword is not done very often) I think it's a net gain.

If it needs to be automated it can be added to the Makefile...
msg199556 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-12 12:28
> If it needs to be automated it can be added to the Makefile...

I tested keyword_grammar.patch on a fresh Python source code (make distclean; ./configure --with-pydebug; make): I can compile Python. I don't understand what should be automated? This patch doesn't need to build a dependency.

keyword_grammar.patch needs probably something for Visual Studio (PC/config.c and PCbuild/pythoncore.vcxproj?).
msg199577 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-10-12 15:28
Is there any change in any benchmark?
msg201868 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-01 01:19
Original:

$ ./python -S -m timeit -s 'import sys' 'import keyword; keyword=None; del sys.modules["keyword"]'
10000 loops, best of 3: 149 usec per loop

Python patched with  keyword_grammar.patch:

$ ./python -S -m timeit -s 'import sys' 'import keyword; keyword=None; del sys.modules["keyword"]'
10000 loops, best of 3: 20 usec per loop


The gain is 129 microseconds (import 7.4x faster). Python starts in between 8,850 and 13,800 microseconds on my PC:


$ ./python -S -m timeit -s 'import subprocess; args=[sys.executable, "-S", "-c", "pass"]' 'subprocess.call(args)'
100 loops, best of 3: 8.85 msec per loop

$ ./python -S -m timeit -s 'import subprocess; args=[sys.executable, "-c", "pass"]' 'subprocess.call(args)'
100 loops, best of 3: 13.8 msec per loop
msg201869 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-01 01:21
"IMO this should be rejected. Failure to improve startup time + more complicated maintenance."

What do you mean by "complicated maintenance"? The C module is short and doesn't use complex functions, I don't expect it to change.
msg206021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-13 02:53
So what? Do you want faster Python startup?
msg206029 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-12-13 09:03
I still like the idea, too. It's more elegant, easier to maintain and gives a speedup, too.
msg206081 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-12-13 13:47
Two things. One, no one has shown a startup improvement using the official startup benchmarks (either normal or nosite). That needs to be established to show an actual benefit.

Second, PEP 399 requires the pure Python version stick around, else you need to request an exception to drop keyword.py from python-dev.

I think you have a chance of getting the exception if you can show an actual startup improvement.
msg206082 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-13 13:50
> Second, PEP 399 requires the pure Python version stick around, else you need to request an exception to drop keyword.py from python-dev.

I don't understand the purpose of providing a Python implementation of the keyword module. If you look at the current implementation, I don't know that anything other than CPython can use it. The code uses Python/graminit.c to regenerate the module.

But I don't want to fight if you don't care of minor speedup, so I'm closing the issue.
msg206084 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-12-13 13:53
You're right, the regeneration might be specific to CPython (although maybe it should work from Grammar/Grammar instead?), but that's just creating the Python file which we check into the stdlib, so any VM can use it, they just can't recreate it (which is fine since every other VM works from a CPython release).
History
Date User Action Args
2022-04-11 14:57:51adminsetgithub: 63429
2013-12-13 13:53:10brett.cannonsetmessages: + msg206084
2013-12-13 13:50:06vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg206082
2013-12-13 13:47:25brett.cannonsetmessages: + msg206081
2013-12-13 09:03:13christian.heimessetmessages: + msg206029
2013-12-13 02:53:03vstinnersetmessages: + msg206021
2013-11-01 01:21:29vstinnersetmessages: + msg201869
2013-11-01 01:19:39vstinnersetmessages: + msg201868
2013-10-12 15:28:14brett.cannonsetnosy: + brett.cannon
messages: + msg199577
2013-10-12 12:28:43vstinnersetmessages: + msg199556
2013-10-12 11:43:47pitrousetmessages: + msg199550
2013-10-12 11:43:10georg.brandlsetnosy: + georg.brandl
messages: + msg199549
2013-10-12 11:37:02pitrousetmessages: + msg199548
2013-10-12 11:33:02christian.heimessetfiles: + keyword_grammar.patch

messages: + msg199547
components: + Extension Modules
stage: patch review
2013-10-12 09:20:45pitrousetmessages: + msg199542
2013-10-12 01:29:24vstinnersetmessages: + msg199531
2013-10-12 01:25:17vstinnersetmessages: + msg199530
2013-10-12 00:33:11vstinnercreate