This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dis module documentation gives no indication of the dangers of bytecode inspection
Type: Stage: resolved
Components: Documentation Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: brett.cannon, eric.araujo, exarkun, georg.brandl, terry.reedy
Priority: normal Keywords:

Created on 2010-02-01 13:25 by exarkun, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg98661 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-02-01 13:25
From python-dev:

On Fri, Jan 29, 2010 at 15:04,  <exarkun@twistedmatrix.com> wrote:
> On 10:47 pm, tjreedy@udel.edu wrote:
>>
>> On 1/29/2010 4:19 PM, Collin Winter wrote:
>>>
>>> On Fri, Jan 29, 2010 at 7:22 AM, Nick Coghlan<ncoghlan@gmail.com> wrote:
>>
>>> Agreed. We originally switched Unladen Swallow to wordcode in our
>>> 2009Q1 release, and saw a performance improvement from this across the
>>> board. We switched back to bytecode for the JIT compiler to make
>>> upstream merger easier. The Unladen Swallow benchmark suite should
>>> provided a thorough assessment of the impact of the wordcode ->
>>> bytecode switch. This would be complementary to a JIT compiler, rather
>>> than a replacement for it.
>>>
>>> I would note that the switch will introduce incompatibilities with
>>> libraries like Twisted. IIRC, Twisted has a traceback prettifier that
>>> removes its trampoline functions from the traceback, parsing CPython's
>>> bytecode in the process. If running under CPython, it assumes that the
>>> bytecode is as it expects. We broke this in Unladen's wordcode switch.
>>> I think parsing bytecode is a bad idea, but any switch to wordcode
>>> should be advertised widely.
>>
>> Several years, there was serious consideration of switching to a
>> registerbased vm, which would have been even more of a change. Since I
>> learned 1.4, Guido has consistently insisted that the CPython vm is not part
>> of the language definition and, as far as I know, he has rejected any byte-
>> code hackery in the stdlib. While he is not one to, say, randomly permute
>> the codes just to frustrate such hacks, I believe he has always considered
>> vm details private and subject to change and any usage thereof 'at one's own
>> risk'.
>
> Language to such effect might be a useful addition to this page (amongst
> others, perhaps):
>
>  http://docs.python.org/library/dis.html
>
> which very clearly and helpfully lays out quite a number of APIs which can
> be used to get pretty deep into the bytecode.  If all of this is subject to
> be discarded at the first sign that doing so might be beneficial for some
> reason, don't keep it a secret that people need to join python-dev to learn.
>

Can you file a bug and assign it to me?

-Brett
msg98906 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-02-05 20:44
The doc begins
"30.12. dis — Disassembler for Python bytecode
The dis module supports the analysis of Python bytecode by disassembling it. Since there is no Python assembler, this module defines the Python assembly language. The Python bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter."

This goes back to when python.exe (CPython) was the only implementation. "Python bytecode" is no longer appropriate. It should be changed to CPython bytecode. My suggestion for a possible update:

30.12. dis — Disassembler for CPython bytecode
CPython currently compiles Python source code to a custom bytecode that is defined by the CPytyon source file Include/opcode.h and explained below. While such implementation details are subject to change in any CPython x.y version, the dis module supports the analysis of current bytecode by disassembling it to a format similar to assembly language."

Calling it an actual assembly language, as the current doc does, implies to me that there is/should be an assembler (which Guido has said there should not be).

"30.12.1. Python Bytecode Instructions
The Python compiler ..."

Python -> CPython

In the glossary:
"bytecode 
Python source code is compiled into bytecode, the internal representation of a Python program in the interpret"
=> something like
"bytecode
CPython currently compiles Python source code to an internal bytecode representation that it uses to execute the program. Some other implementations do something similar."

These suggestions touch on the larger issue of differentiating and disentangling language doc from CPython implementation doc. I support this even though I have never used any of the other implementations.
msg109126 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-02 20:11
Brett, should this be reassigned to docs@python? I gave a suggested text change months ago. The need for a change like this was mentioned again today on pydev in the thread "Can Python implementations reject semantically invalid expressions?". Since the current deficiency has been noted repeatedly, I think the priority should be at least normal.

Unless someone suggests something even better, I think my proposed replacememt should be accepted, formatted, and applied. I think it is definitely better than the current text.
msg109133 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-02 21:20
Sorry, Terry, I didn't even notice the corrections in the issue since they were inlined in a comment instead of as an attached file. I will have a look right now.
msg109143 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-02 22:04
Fixed in r82456. I decided to make a warning directive so that it's really obvious that people should not consider the dis module and bytecode as stable.

Once Python 2.7.0final is out the door I will backport the patch.
msg109277 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-05 00:01
I believe Brett meant to leave this open until he finished it by backporting to 2.7 (which is now reopened for patches). Otherwise, it might get forgotten.

Sidenote: this is the issue I referred to in #9132 re the 'patch' keyword.
msg109278 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-05 00:28
On Sun, Jul 4, 2010 at 17:01, Terry J. Reedy <report@bugs.python.org> wrote:

> I believe Brett meant to leave this open until he finished it by backporting to 2.7 (which is now reopened for patches). Otherwise, it might get forgotten.

Exactly right, Terry.
msg111030 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-07-21 09:52
r83012 for 3.1
r83013 for 2.7
History
Date User Action Args
2022-04-11 14:56:57adminsetgithub: 52077
2010-07-21 09:52:29brett.cannonsetstatus: open -> closed

messages: + msg111030
2010-07-19 02:44:39belopolskysetkeywords: - needs review
2010-07-05 00:28:05brett.cannonsetmessages: + msg109278
2010-07-05 00:01:28terry.reedysetstatus: closed -> open

messages: + msg109277
2010-07-04 14:13:07eric.araujosetstatus: open -> closed
2010-07-02 22:04:37brett.cannonsetresolution: fixed
stage: patch review -> resolved
messages: + msg109143
versions: + Python 2.7
2010-07-02 21:20:18brett.cannonsetkeywords: + needs review

messages: + msg109133
stage: needs patch -> patch review
2010-07-02 20:11:32terry.reedysetpriority: low -> normal

messages: + msg109126
2010-02-06 15:23:33eric.araujosetnosy: + eric.araujo
2010-02-05 20:44:27terry.reedysetnosy: + terry.reedy
messages: + msg98906
2010-02-03 08:20:11brett.cannonsetpriority: low
stage: needs patch
2010-02-01 13:28:10pitrousetassignee: georg.brandl -> brett.cannon

nosy: + brett.cannon
2010-02-01 13:25:14exarkuncreate