This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dis.findlabels ignores EXTENDED_ARG
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder: modulefinder should reuse the dis module
View: 26881
Assigned To: Nosy List: Barun Parruck, eric.fahlgren, llllllllll, ncoghlan, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2016-02-27 03:54 by eric.fahlgren, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
preliminarypatch.diff Barun Parruck, 2016-02-27 04:36 review
wrongtestfindlabels.py Barun Parruck, 2016-02-27 08:18 Mistake in this one
testfindlabels2.py Barun Parruck, 2016-02-27 10:26
testfindlabels.py eric.fahlgren, 2016-02-27 14:02 Single test case that iterates over all opcodes
testfindlabels.py eric.fahlgren, 2016-02-28 18:20 Generalized algorithm for creating bytecode from args
bad.diff eric.fahlgren, 2016-02-28 21:02 Had an obvious bug. review
dis_with_code_scanner.diff eric.fahlgren, 2016-03-02 23:44 review
Messages (24)
msg260913 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 03:54
When trying out dis.dis on some synthetically long functions, I noted that spurious branch targets were being generated in the output.  First one is at address 8:

157           0 LOAD_CONST               1 (1)
              3 DUP_TOP
              4 STORE_FAST               0 (z)
              7 DUP_TOP
        >>    8 STORE_FAST               1 (a)
             11 DUP_TOP

I dug into findlabels and notices that it pays no attention to EXTENDED_ARG.  The fix is pretty simple, basically copy pasta from dis._get_instructions_bytes, at line 369, in the 3.5.1 release code add all the "extended_arg" bits:

    extended_arg = 0
    while i < n:
        op = code[i]
        i = i+1
        if op >= HAVE_ARGUMENT:
            arg = code[i] + code[i+1]*256 + extended_arg
            extended_arg = 0
            i = i+2
            if op == EXTENDED_ARG:
                 extended_arg = arg*65536
            label = -1
msg260914 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 04:01
I'll take a look at it! :) Would you like unit tests as well?
msg260915 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 04:12
Hi, to check if I've done things right, which functions did you try out dis.dis on?
msg260916 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 04:17
My test case:

def long():
    z = a = b = c = d = e = f = g = h = 1
    while x:
        x = x if x and x or not x else x
        above line repeated 2999 more times

import dis
print(dis.findlabels(long.__code__.co_code)[:10])

Buggy output:
[35510, 35509, 62, 69, 78, 81, 93, 100, 109, 112]

Correct output:
[101046, 101045, 62, 69, 78, 81, 93, 100, 109, 112]
msg260917 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 04:20
Our paths crossed, I don't know exactly how you'd add a test case for this, maybe construct the monster function in a string, eval the string, the use the synthesized function in dis.findlabels?
msg260918 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 04:34
Well, now that I'm thinking about it, you could synthesize a bytecode stream trivially and have a much better test.  This is completely off the top of my head, so take it is guaranteed to (probably) not work as written, but it should get you started:

from opcodes import *
import dis
bytecode = (
    chr(EXTENDED_ARG) + chr(1) + chr(0) + 
    chr(JUMP_IF_TRUE_OR_POP) + chr(0) + chr(0)
)
print(dis.findlabels(bytecode))
msg260919 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 04:34
Allright, so I'm uploading a preliminary patch, please look through it to see if I understood what you meant me to do about dis.findlabels.

The tests seem to mostly pass, except oddly enough, urlstdlib2, which is probably system-specific?

I get the correct output that you specify for that particular test function.

I'll see about building a unit test.
msg260920 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 04:36
The patch I forgot to attach. Ha ha.
msg260921 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 05:36
Hi, I'm a little confused as to how to write a test using bytecode streams...probably due to my lack of clarity as to what exactly dis.disassemble does. Is there any way you could give me a bit more information? :)
msg260925 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 06:48
The findlabels function takes a bytecode array of type bytes, usually the
actual code from a function.  My original test case uses the full Python
compiler from source code to a CodeType object to create the bytecodes
(plus all that other stuff that makes up a function), then extracts just
the interesting part and passes that into findlabels.

The good part is that you can pretend you're the compiler by just putting
the correct bytes into a array and feed it into the various dis functions.
The EXTENDED_ARG operator plays with the operand of the succeeding
instruction, everything else either doesn't have an argument or has two
bytes.

Here's a real test case, I don't know how you write unit tests for the
stdlib, but you can compare the output of the findlabels call with a known
value, and that should get you pretty close.

from opcode import *
code = bytes(
    chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) +
    chr(EXTENDED_ARG) + chr(1) + chr(0) +
    chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) +
    chr(opmap["RETURN_VALUE"]),
    encoding="latin-1"
)
import dis
dis.dis(code)
print(dis.findlabels(code))
if dis.findlabels(code) == [0x0000+3, 0x00010000+9]:
    print("Test passed")

Take a look in the stdlib opcode.py and find the various "JUMP" operators,
those are the guys we care about for this.  Try out a bunch of cases by
augmenting the above definition of "code" and you'll soon get a feel for
what's going on.

As real, executable bytecode the above is of course non-sensical, but for a
test, it's great because you can predict exactly what should be produced.
​
msg260929 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 07:44
Hi, yes, that works much better, and I've definitely understood!

Now, about the tests, how large would you like them to be? For instance :

code = bytes (
	chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(255) + 
	chr(EXTENDED_ARG) + chr(1) + chr(1) + 
	chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) +
	chr(EXTENDED_ARG) + chr(1) + chr(0) +
	chr(opmap["JUMP_ABSOLUTE"]) + chr(0) + chr(1) +
	chr(opmap["RETURN_VALUE"]),
	encoding="latin-1"
	)

Would return [65283, 16842761, 65792]

And I'll make sure I test all the jump instances (I'm shuddering to imagine what a real computer would do if fed with those byte codes)

Do you want the tests to be large? (array size > 10^2/2 as a result), or is the example above fine, if I add three or four more example testing each jump instance?
msg260931 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 08:18
Four test cases have been included in the unittests, using the module unittest. One is the original one you gave me, the others are some that I played around with it, and thought would be useful to include.

I would love some code reviews at this point!
msg260939 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-27 10:26
Sorry, the previous file had a bug.
msg260940 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-27 14:02
Lookin' good so far.  How about we try it on all the opcodes that have arguments?

See attached example for which I can see two obvious improvements:
1) It could be improved by taking apart that "args" list and using it to synthesize "sample_code" rather than having to hand duplicate the values in two places, albeit with different byte order.
2) Likewise, my hard-coded "offsets" table is pretty awful. :)

Also, is there already a test for the dis module in which you could just add this as a case?
msg260977 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-02-28 13:46
Hi, sorry, been pretty busy, and will be for a few days, so my replies may not be instant!

Having looked through the dis test module, to be honest, I can't even find a call to dis.findlabels...odd. Maybe I've got it wrong, and I wouldn't mind somebody else taking a look! But either way, a quick overview shows me that it basically picks up functions, figures out the disassembled function, and compares expected output, in a similar way that we're working, actually. 

I'll look through your example, which looks pretty good! It works as expected on my repo (fails and passes appropriately), and I'll see about improving it, but for now, it seems to do its job.
msg260979 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-28 16:11
Oh, don't worry about time between responses, we all have lives (well, some of us anyhow :) ).

After looking at Lib/test/test_dis.py, I think it's just an oversight that there is no specific findlabels test (it's tested implicitly by ``dis.dis`` after all), and it would be good thing for you to add one.  The obvious place seems like a new method at the bottom of the ``DisTests`` class, but we'll have to rely on (probably) Yury to make that call.
msg260981 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-28 18:20
I just remembered that code can have more than one (up to three?) EXTENDED_ARG operators before the real opcode, so I added that generalization to build code around the args list...
msg260987 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-28 21:02
Two things:

1) Verified this has always been a problem and still is in the development branch, so I added 2.7 and 3.6 to the versions list.

2) Couldn't tolerate the duplicate code handling the extended args operator, so in the interests of DRY, I moved the code scanner to a generator function (see attached dis_with_code_scanner.diff).

The patch is definitely not required to fix this bug, but it does isolate the original problem area to just one piece of code.

Could we get a get a quick review from a core dev saying either "go with Barun's less invasive preliminarypatch.diff" or "go with Eric's greater-churn dis_with_code_scanner.diff patch?"

(We still need to flesh out the testing a bit, so don't call us, we'll call you. :) )
msg260988 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-02-28 23:26
Oops, wrong/bad patch, delete line 310 "arg = None" in _get_instruction_bytes...
msg261112 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-03-02 12:58
So...firstly hi, and sorry for disappearing like that!

You want tests for the dis.findlabels library? What sort of tests are you looking for? Similiarly sized bytecode streams like the ones we worked with, or did you have something else in mind?

B
msg261142 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-03-02 23:44
Barun, take a look at the latest version of the testfindlabels.py, see what you think.  If it works for you, maybe move the test function into Lib/test/test_dis.py as part of the standard dis module tests.  Still need to look at the code that's being tested and find out what cases could cause problems and then augment the test to make sure it covers those.
msg261590 - (view) Author: Barun Parruck (Barun Parruck) * Date: 2016-03-11 18:15
Hi!

Okay, sorry for disappearing like that(this is becoming rather a habit isn't it!), but with Gsoc applications, I've been insanely busy trying to get patches and bugs accepted for my first Gsoc ever. I do apologize for leaving you hanging, it was not my intention!

I'll try my best to give you an update by the end of this week and hopefully be able to keep working on this!
msg267419 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-05 15:30
This bug was fixed in issue26881 with similar patch. Sorry, I didn't know about this issue. Your patches look good. In any case thank you for your effort.
msg267477 - (view) Author: Eric Fahlgren (eric.fahlgren) * Date: 2016-06-05 21:15
Thanks, Serhiy.  I sort of figured that it would get fixed with the
wordcode rework, and was going to verify that it was when 3.6 settled down.

On Sun, Jun 5, 2016 at 8:30 AM, Serhiy Storchaka <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka added the comment:
>
> This bug was fixed in issue26881 with similar patch. Sorry, I didn't know
> about this issue. Your patches look good. In any case thank you for your
> effort.
>
> ----------
> nosy: +serhiy.storchaka
> resolution:  -> out of date
> stage:  -> resolved
> status: open -> closed
> superseder:  -> modulefinder should reuse the dis module
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue26448>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:58:28adminsetgithub: 70635
2016-06-05 21:15:47eric.fahlgrensetmessages: + msg267477
2016-06-05 15:30:34serhiy.storchakasetstatus: open -> closed

superseder: modulefinder should reuse the dis module

nosy: + serhiy.storchaka
messages: + msg267419
resolution: out of date
stage: resolved
2016-03-11 18:15:06Barun Parrucksetmessages: + msg261590
2016-03-02 23:44:42eric.fahlgrensetfiles: + dis_with_code_scanner.diff

messages: + msg261142
2016-03-02 20:22:34vstinnersetnosy: + vstinner
2016-03-02 12:58:04Barun Parrucksetmessages: + msg261112
2016-02-28 23:26:39eric.fahlgrensetmessages: + msg260988
2016-02-28 21:02:23eric.fahlgrensetfiles: + bad.diff

messages: + msg260987
versions: + Python 2.7, Python 3.6
2016-02-28 18:20:55eric.fahlgrensetfiles: + testfindlabels.py

messages: + msg260981
2016-02-28 16:11:42eric.fahlgrensetmessages: + msg260979
2016-02-28 13:46:10Barun Parrucksetmessages: + msg260977
2016-02-28 06:36:30ned.deilysetnosy: + ncoghlan, yselivanov
2016-02-27 14:02:38eric.fahlgrensetfiles: + testfindlabels.py

messages: + msg260940
2016-02-27 10:26:50Barun Parrucksetfiles: + testfindlabels2.py

messages: + msg260939
2016-02-27 08:18:27Barun Parrucksetfiles: + wrongtestfindlabels.py

messages: + msg260931
2016-02-27 07:44:58Barun Parrucksetmessages: + msg260929
2016-02-27 06:48:44eric.fahlgrensetmessages: + msg260925
2016-02-27 05:36:45Barun Parrucksetmessages: + msg260921
2016-02-27 04:36:09Barun Parrucksetfiles: + preliminarypatch.diff
keywords: + patch
messages: + msg260920
2016-02-27 04:34:29Barun Parrucksetmessages: + msg260919
2016-02-27 04:34:10eric.fahlgrensetmessages: + msg260918
2016-02-27 04:20:31eric.fahlgrensetmessages: + msg260917
2016-02-27 04:17:51eric.fahlgrensetmessages: + msg260916
2016-02-27 04:12:30Barun Parrucksetmessages: + msg260915
2016-02-27 04:01:44Barun Parrucksetnosy: + Barun Parruck
messages: + msg260914
2016-02-27 03:59:31llllllllllsetnosy: + llllllllll
2016-02-27 03:54:10eric.fahlgrencreate