New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytecode generation is not deterministic #59573
Comments
Consider this small example (you might have to run sample program multiple $ cat dis-closure.py
import dis def adder(a, b):
def add():
return a + b
return add print(dis.dis(adder(1, 2).__code__)) $ ./python.exe dis-closure.py
5 0 LOAD_DEREF 0 (a)
3 LOAD_DEREF 1 (b)
6 BINARY_ADD
7 RETURN_VALUE
None
$ ./python.exe dis-closure.py
5 0 LOAD_DEREF 1 (a)
3 LOAD_DEREF 0 (b)
6 BINARY_ADD
7 RETURN_VALUE
None The order of 'co_cellvars' and 'co_freevars' can be different from compile to This is due to the fact that these variable sets are managed with hashes and I am not sure if these are the only areas that causes bytecode generation |
I might come to regret asking this, but so what? Is this actually causing you issues, or are you literally just finding "this behavior surprising" and that's it? I mean we could simply sort the tuples, but I don't know what kind of performance hit that would cause. |
On Mon, Jul 16, 2012 at 8:11 AM, Brett Cannon <report@bugs.python.org> wrote:
I noticed it in the context of working on bpo-15352. There are still Since the bytecode generation isn't deterministic, there can be some noisy diffs That is the only concrete issue I currently am aware of. This issue may not |
At least we know the hash randomisation is working :) Spurious changes when freezing modules seems like a legitimate reason to fix it - the import benchmarks would probably give the compiler enough of a workout to highlight if the sorting is excessively time consuming. |
Ditto. I think predictability of bytecode generation is useful, e.g. for make-like tools that examine content, or for unit testing. |
OK, so it sounds like we need to do the equivalent of sorting those tuples when generating the bytecode. That would suggest that probably need to tweak Python/compile.c to make it deterministic. |
I haven't done any benchmarking (yet), but here is a patch implementing the basic sorting approach. |
New patch that addresses feedback from Brett's code review (thanks!). |
OK, I ran the Python benchmark suite. I don't have much experience with the benchmark suite so let me know if I did something wrong or need to do more. I ran the following: python perf.py -b 2n3 ../cpython/python.exe ../cpython/python-with-sorting.exe) which produces: Report on Darwin quicksilver 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr 9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64 i386 ### call_method_slots ### ### call_simple ### ### fastpickle ### ### iterative_count ### ### json_load ### ### regex_compile ### ### richards ### ### silent_logging ### The following not significant results are hidden, use -v to show them: |
You ran the benchmarks correctly, but I was more worried about compilation time than execution time (changing which index is for what really shouldn't make a difference in performance, and the benchmarks show that). If you want to test using the importlib benchmarks, run |
Oh, cool -- I didn't know about the importlib benchmarks. Thanks Brett. Here are the results (OS X 10.7.4, Dual 1.7GHz Core i5, 4 GB RAM): $ ./python.exe -m importlib.test.benchmark
Measuring imports/second over 1 second, best out of 3
Entire benchmark run should take about 33 seconds
Using <function __import__ at 0x1063bdd40> as __import__ sys.modules [ 41831 41812 40914 ] best is 41,831 $ ./python-with-sorting.exe -m importlib.test.benchmark
Measuring imports/second over 1 second, best out of 3
Entire benchmark run should take about 33 seconds
Using <function __import__ at 0x104b01d40> as __import__ sys.modules [ 42369 42203 42348 ] best is 42,369 Pretty similar results either way. |
Yep, so I consider the performance worry dealt with. |
Agreed, so definite +1 from me. |
So I think that just leaves one other person to verify the patch works as expected and then commit it. I can hopefully later this week, but no sooner than Friday, so if someone can get to it faster that would be great. |
I can commit today unless you think we need someone else to independently verify the patch. |
BTW, I think this should be committed on 2.7 and 3.2 as well since the same issue can happen when throwing -R. Any objections? |
Nah, you are a committer and thus qualify as the second review. =) As for backporting, it only affects regeneration of bytecode, so I don't see any major harm in it (might surprise some people, but there is a legitimate reason for fixing this). |
New changeset 7eb0180c1734 by Meador Inge in branch '2.7': New changeset 79d54fba49b3 by Meador Inge in branch '3.2': New changeset 578066372485 by Meador Inge in branch 'default': |
Committed. Thanks for the reviews y'all. |
Meador, you have forgotten to fix the error "PyList_GET_SIZE(src)". src is not list, should be "PyList_GET_SIZE(sorted_keys)". |
D'oh! Thanks for catching that Serhiy! Fixing now. Sorry about that. |
New changeset 47e074f6ca26 by Meador Inge in branch '2.7': New changeset 1c46f2ede4cb by Meador Inge in branch '3.2': New changeset 6502de91610d by Meador Inge in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: