classification
Title: Wordcode, part 2
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Demur Rumed, Mark.Shannon, berker.peksag, brett.cannon, eric.fahlgren, haypo, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-05-26 13:13 by serhiy.storchaka, last changed 2016-09-12 14:55 by haypo.

Files
File name Uploaded Description Edit
wordcode2.patch serhiy.storchaka, 2016-05-26 13:13 review
word-jump-offsets.patch serhiy.storchaka, 2016-05-28 15:46 review
wordcode-refactor.patch serhiy.storchaka, 2016-06-08 19:42 review
wordcode-refactor2.patch serhiy.storchaka, 2016-06-10 18:54 review
Messages (21)
msg266431 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-26 13:13
This is the second stage of converting to wordcode (issue26647).

Proposed patch makes bytecodecode consisting of code units (16-bit words) instead of bytes. It includes following changes:

* Changes meaning of jump offsets. They counts not bytes, but code units. This extends the range addressed by short commands (from 256 bytes to 256 words) and simplifies ceval.c.

* Changes f_lasti, tb_lasti etc to count code units instead of bytes.

* Changes disassembler to show addresses in code units, not bytes.

* Refactores the code.

These changes break compatibility (already broken by switching to 16-bit bytecode). The first one breaks compatibility with compiled bytecode and needs incrementing the magic number.
msg266435 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-05-26 13:29
> Changes f_lasti, tb_lasti etc to count code units instead of bytes.

I asked Demur to not break f_lasti. I don't understand if this change
breaks applications using f_lasti or not.

For example, asyncio/coroutines.py uses:

            if caller.f_code.co_code[caller.f_lasti] != _YIELD_FROM:
                value = value[0]

Does this code still work with your change?

Maybe this code is already broken by wordcode, but it doesn't really
matter since it should only be used on the exact version 3.4.0. The
code works around a bug in Python 3.4.0, fixed in Python 3.4.1 (issue
#21209).

Other known users of f_lasti are development tools like debuggers
(pdb), profilers, code coverage, etc. We should check these tools.
msg266446 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-05-26 18:38
So is avoiding changing f_lasti just to minimize breakage of tools? Aren't they going to have to update to support the wordcode changes anyway?
msg266447 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-26 18:55
The patch contains the change of Lib/asyncio/coroutines.py. This is the only change in Python code besides the dis module.

I can keep f_lasti to be twice the number of instructions, but this will complicate the patch. The simplest way perhaps is to convert this read-only attribute to the property that multiplies internal f_lasti by 2.
msg266489 - (view) Author: Demur Rumed (Demur Rumed) * Date: 2016-05-27 12:06
https://github.com/search?q=f_lasti&type=Code

Popular use of f_lasti is checking it for -1, checking the instruction at the byte offset of f_lasti, checking the argument with code[f_lasti+1] (Some bad code checking f_lasti+3 which'll break with 3.6)

abarnert discussed how bytecode should be typed to Python code. Ideally it'd be typed as a "(instruction, arg)" tuple. He considered creating a "words" type similar to "bytes" but with 16 bit values. It's a bit niche to introduce a builtin for. So if the co_code object is remaining a bytes object then it seems intuitive to keep f_lasti as a bytes offset. Clashes with jump offsets no longer being a bytes offset even in Python code tho

In reality most of the results on github all seem to be copying a few distinct uses. So maybe backwards compatibiltiy isn't so important

Other search https://searchcode.com/?q=f_lasti&loc=0&loc2=10000&src=3&src=7&src=1&lan=19 doesn't produce many results either
msg266490 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-05-27 12:29
> if the co_code object is remaining a bytes object then it seems intuitive to keep f_lasti as a bytes offset

Right.

> In reality most of the results on github all seem to be copying a few distinct uses. So maybe backwards compatibiltiy isn't so important

Backwards compatibiltiy is important.
msg266556 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-28 14:53
Here is a patch that implements only the first change -- makes jump offsets be in 16-bit units, not bytes. This is minimal change, it doesn't include refactoring.
msg266604 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-05-29 17:33
I don't see how this is a simplification.   The additional /2 and *2 on the affected lines makes the code a little harder to reason about and it loses some of the cleanness achieved by the last patch.   To me, it also increases conceptual complexity because INSTR_OFFSET() no longer gives the byte address adjustment.
msg266611 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-29 18:40
word-jump-offsets.patch doesn't simplify the code, but rather complicates it, because this is minimal patch that doesn't include the refactoring. The main benefit of this patch is that it extends the range addressed by short jump instruction. The other benefit is that the target of jump instruction is now always point at instruction boundary.

wordcode2.patch includes refactoring and other changes:

* Changes jump offsets. This extends addressed range and can make co_code more compact.
* Changes co_lnotab. This can make co_lnotab more compact.
* Changes f_lasti.
* Changes the disassembler.
* Refactors the C code (makes it codeunit-oriented instead of byte-oriented). Simplifies the code complicated by other changes.

I can provide these steps by separate patches (word-jump-offsets.patch is the first of them), but every separate patch can temporary complicate the code. Three of these changes (except the disassembler changing and refactoring) break the compatibility of pyc-files and need incrementing the magic number.
msg267880 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-08 19:42
Here is a patch that just refactors the code. It renames OPCODE and OPCODE to _Py_OPCODE and _Py_OPCODE and moves them to code.h for reusing in other files. Introduces _Py_CODEUNIT as an alias to unsigned short (if it is 16-bit, otherwise a compile error is raised). Makes compiler and peepholer to operate with _Py_CODEUNIT units instead of bytes. Replaces or scale magic numbers with sizeof(_Py_CODEUNIT). Adds fill_nops() for filling specified region with NOPs. Decreases memory consumption for peepholer (doesn't allocate memory for unused odd addresses).
msg268132 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-10 17:43
I really don't think any of this should be done at all.  The current code is clean and fast (and to my eyes at least is very readable).
msg268142 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-10 18:49
This is not just about cleaning (to my eyes current code is not very readable, and I read it many times, perhaps more times than any other core developer in last months). There are other benefits. Changing jump offsets allows to get rid of EXTENDED_ARGs for the part of jump opcodes. Changing lnotab makes it more compact and allows the peepholer to optimize the code that it rejects now. Refactoring includes the change that decreases memory consumption of the peepholer (from 4 bytes per bytecode byte to 2 bytes per bytecode byte). Changing jump offsets together with changing f_lasti removes redundant multiplications and divisions by 2. Separate changes can complicate some parts of code, but next changes removes this complication. Only all changes together achieve maximal cleanness.

I think that converting to wordcode is not complete without these changes. I approved the wordcode patch only having in mind following changes. It is more painless to make all changes in one Python release than break compatibility during few releases.
msg268143 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-10 18:54
update patch replaces yet few magic constants.
msg268145 - (view) Author: Demur Rumed (Demur Rumed) * Date: 2016-06-10 19:06
The patches LGTM & seem to be implementation of follow up ideas outlined in the first portion. It'd be good to verify that benchmarks are relatively unaffected
msg275771 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-11 10:48
New changeset dd046963bd42 by Serhiy Storchaka in branch 'default':
Issue #27129: Replaced wordcode related magic constants with macros.
https://hg.python.org/cpython/rev/dd046963bd42
msg275773 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-11 11:49
From http://buildbot.python.org/all/builders/s390x%20Debian%203.x/builds/1811/steps/compile/logs/stdio (I also saw the same compile error on another Linux boxes)

_freeze_importlib: Python/peephole.c:524: PyCode_Optimize: Assertion `((codestr[i]) >> 8) == 100' failed.
Makefile:735: recipe for target 'Python/importlib.h' failed
make: *** [Python/importlib.h] Aborted
msg275780 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-11 12:19
New changeset b49a938eaa31 by Serhiy Storchaka in branch 'default':
Fixed refactoring bug in dd046963bd42 (issue27129).
https://hg.python.org/cpython/rev/b49a938eaa31
msg275781 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-11 12:20
Thanks Berker!
msg276044 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-12 13:44
This issue can now be closed, no?
msg276049 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-12 14:00
No, only the simpler and safer part was committed. I divided the original patch on four parts, but since the code was significantly evolved, they no longer applied clearly.
msg276055 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-12 14:55
Serhiy Storchaka added the comment:
> No, only the simpler and safer part was committed. I divided the original patch on four parts, but since the code was significantly evolved, they no longer applied clearly.

Oh ok. I saw that a change was pushed, I didn't read the history of the issue.

I should take a look after the beta1 release.
History
Date User Action Args
2016-11-24 22:20:31haypounlinkissue26647 dependencies
2016-09-12 14:55:45hayposetmessages: + msg276055
2016-09-12 14:00:22serhiy.storchakasetmessages: + msg276049
2016-09-12 13:44:00hayposetmessages: + msg276044
2016-09-11 12:20:42serhiy.storchakasetmessages: + msg275781
2016-09-11 12:19:39python-devsetmessages: + msg275780
2016-09-11 11:49:30berker.peksagsetnosy: + berker.peksag
messages: + msg275773
2016-09-11 10:48:51python-devsetnosy: + python-dev
messages: + msg275771
2016-06-10 19:06:44Demur Rumedsetmessages: + msg268145
2016-06-10 18:54:37serhiy.storchakasetfiles: + wordcode-refactor2.patch

messages: + msg268143
2016-06-10 18:49:32serhiy.storchakasetmessages: + msg268142
2016-06-10 17:43:39rhettingersetmessages: + msg268132
2016-06-08 19:42:24serhiy.storchakasetfiles: + wordcode-refactor.patch

messages: + msg267880
2016-06-04 21:05:25serhiy.storchakasetnosy: + Mark.Shannon
2016-06-04 17:55:43eric.fahlgrensetnosy: + eric.fahlgren
2016-05-29 18:40:47serhiy.storchakasetmessages: + msg266611
2016-05-29 18:25:53ebarrysetmessages: - msg266607
2016-05-29 18:25:41ebarrysetmessages: - msg266608
2016-05-29 18:24:18Demur Rumedsetfiles: - forbegin.patch
2016-05-29 18:24:07Demur Rumedsetmessages: + msg266608
2016-05-29 18:23:14Demur Rumedsetfiles: + forbegin.patch

messages: + msg266607
2016-05-29 17:33:51rhettingersetnosy: + rhettinger
messages: + msg266604
2016-05-28 15:46:10serhiy.storchakasetfiles: + word-jump-offsets.patch
2016-05-28 15:45:53serhiy.storchakasetfiles: - word-jump-offsets.patch
2016-05-28 14:53:26serhiy.storchakasetfiles: + word-jump-offsets.patch

messages: + msg266556
2016-05-27 12:29:50hayposetmessages: + msg266490
2016-05-27 12:06:00Demur Rumedsetnosy: + Demur Rumed
messages: + msg266489
2016-05-26 18:55:33serhiy.storchakasetmessages: + msg266447
2016-05-26 18:38:31brett.cannonsetmessages: + msg266446
2016-05-26 18:07:14brett.cannonsetnosy: + brett.cannon
2016-05-26 13:29:29hayposetmessages: + msg266435
2016-05-26 13:14:27serhiy.storchakalinkissue26647 dependencies
2016-05-26 13:13:07serhiy.storchakacreate