msg135015 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2011-05-03 02:11 |
A comment in the definition of PyCodeObject in Include/code.h says:
/* The rest doesn't count for hash or comparisons */
which, I think, makes a lot of sense. The implementation doesn't follow this comment, though. code_hash actually includes co_name and code_richcompare includes co_name and co_firstlineno. This makes hash and comparison inconsistent with each other and with the comment.
|
msg135321 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-05-06 16:41 |
What fix would you propose?
|
msg135395 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2011-05-07 01:07 |
I would propose changing implementation to match the comment. At a minimum, remove co_firstlineno comparison. As the last resort, at least change the comment.
|
msg135541 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2011-05-08 21:13 |
It appears that
* co_name was added to hash and cmp in this check-in by Guido:
http://hg.python.org/cpython-fullhistory/diff/525b2358721e/Python/compile.c
I think the reason was to preserve function name when defining multiple functions with the same code in one function or module. (By default function gets it's name from code, but only one code object will be preserved, since all constants in a function are stored in a dict during compilation).
* co_firstlineno was added to cmp (but not hash) in this check-in by Brett:
http://hg.python.org/cpython-fullhistory/rev/8127a55a57cb
In an attempt to fix this bug: http://www.mail-archive.com/python-bugs-list@python.org/msg02440.html
It doesn't actually fix the bug and makes hash inconsistent with cmp. I'm not convinced that the bug is valid -- why would you care if identical lambdas share or not share the code?
Both changes seem come from a tension between code object's original intention to compare "functionally equivalent" codes equal and side-effects of constants de-duplication in a function (loss of function name, broken breakpoints and line numbers in a debugger).
I can think of 2 ways to address this. Change hash/cmp back to ignore co_name and co_firstlineno and then:
1) Never dedup codes, or only dedup codes with the same co_firstlineno (can compare co_name too, but I can't think of a way to create 2 named funcs on the same line). This is pretty much what the current code does.
2) Separate "debug information" (co_filename, co_name, co_firstlineno, co_lnotab) from code object into a separate object. Construct a function from both objects. Allow de-duplication of both. This will always preserve all information in a function, but allows to share code object between identical functions. This is a much more intrusive change, though, e.g. frame will need a reference to debug information.
|
msg135542 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2011-05-08 21:23 |
Btw, disabling dedup for codes won't be the first exception -- we already avoid coalescing -0.0 with 0.0 for float and complex, even though they compare equal.
|
msg180100 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-01-16 18:57 |
> It doesn't actually fix the bug and makes hash inconsistent with cmp.
The only constraint is that a == b imply hash(a) == hash(b). But the converse doesn't have to be true, i.e. if it perfectly possible to have hash(a) == hash(b) and a != b (pretty much by definition of a hash function, actually).
What Brett's change means is that the hash function isn't as fine-grained as it could be, but I'm not sure it's very important for code objects.
Of course, that Brett's commit doesn't fix the original bug (if confirmed) is a more annoying issue :-)
|
msg180101 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-01-16 18:58 |
Woops, I saw Brett unnosied himself, sorry for nosying him by mistake.
|
msg180106 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-01-16 20:20 |
Making as pending for someone to prove my fix is bad for the problem (since it passes an explicit test), else I will close it in the near-ish future.
|
msg180117 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2013-01-17 03:41 |
My comment will make more sense if you follow the links that I provided.
Brett's check-in (http://hg.python.org/cpython-fullhistory/rev/8127a55a57cb) says that it fixes bug #1190011 (http://www.mail-archive.com/python-bugs-list@python.org/msg02440.html). The bug claims that:
>> def f(): return ((lambda x=1: x), (lambda x=2: x))
>> f1, f2 = f()
>> id(f1.func_code) != id(f2.func_code)
The above does not hold true. It should according to
test_compile (reported in HEAD as bug #1048870).
Certainly comparing co_firstlineno can't fix this, as both lambdas are defined at the same line. There are some more comments by Nick Coghlan on that bug, but I can't find anything clearly stating what the problem is believed to be and what the resolution is.
I understand the reason for Brett's fix and why fixing 1190011 literally is not required. The issue was strange debugger/profiler/etc behavior with identical lambdas defined on _different_ lines. Since python tools typically have line resolution (e.g. debugger doesn't know column position of the current statement), merging lambdas defined on the same line is mostly invisible, and does not cause problems.
So to clarify my opening comment -- this is mostly a documentation issue. The only documentation for the behavior of code cmp is the comment in Include/code.h, which says:
/* The rest doesn't count for hash or comparisons */
Both hash and compare break this in different ways. At a minimum the comment should be fixed. If the current behavior is considered right, this can be it.
My other point is that we can have code objects with different co_firstlineno compare equal and still not have them deduplicated by the compiler (thus avoid problems with debugger). This can be done similar to how float -0.0 is equal to 0.0, but still not deduplicated. Separating these concerns can allow both most logical compare behavior and good support for debugging.
Re hash legally returning same values for non-equal objects -- sure, only a==b implies hash(a)==hash(b), and not the other way round. But this doesn't mean that we should just return 42 from all hash functions. Not including a field in hash that you include compare needs a very good reason. If there's such a reason in this case, please let me know.
|
msg180159 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-01-17 22:38 |
I have no issue with the current behaviour, so it sounds like the source comment just needs updating. Care to suggest some wording, Eugene?
|
msg180249 - (view) |
Author: Eugene Toder (eltoder) * |
Date: 2013-01-19 18:35 |
If you add co_firstlineno into code_hash you can say something like
/* The rest isn't used in hash and comparisons, except
co_name and co_firstlineno, which are preserved for
tracebacks and debuggers. */
(Otherwise you'd need to explain why co_firstlineno is not in hash.)
BTW, since co_firstlineno is used including co_name is redundant -- there's no way to have 2 named code objects on the same line, AFAIK.
|
msg216132 - (view) |
Author: Caelyn McAulay (math_foo) |
Date: 2014-04-14 17:40 |
Here is a patch to add the requested documentation to code.h - I expanded it to specify (as per the conversation in this issue) that co_name is used in both hash and comparisons while co_firstlineno is used only in comparisons.
I do not attempt explain the inconsistency; but let the comment reflect reality.
|
msg216151 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2014-04-14 18:20 |
New changeset 0e35cef1d984 by Andrew Kuchling in branch 'default':
#11983: update comment to describe which fields are used and why.
http://hg.python.org/cpython/rev/0e35cef1d984
|
msg216152 - (view) |
Author: A.M. Kuchling (akuchling) * |
Date: 2014-04-14 18:21 |
Thanks for the patch!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:16 | admin | set | github: 56192 |
2014-04-14 18:21:26 | akuchling | set | status: open -> closed
assignee: akuchling versions:
+ Python 3.5, - Python 3.3 nosy:
+ akuchling
messages:
+ msg216152 resolution: fixed stage: resolved |
2014-04-14 18:20:15 | python-dev | set | nosy:
+ python-dev messages:
+ msg216151
|
2014-04-14 17:40:58 | math_foo | set | files:
+ comment11983.patch
nosy:
+ math_foo messages:
+ msg216132
keywords:
+ patch |
2013-01-19 18:35:27 | eltoder | set | messages:
+ msg180249 |
2013-01-17 22:38:43 | brett.cannon | set | messages:
+ msg180159 |
2013-01-17 03:41:45 | eltoder | set | status: pending -> open
messages:
+ msg180117 |
2013-01-16 20:20:30 | brett.cannon | set | status: open -> pending nosy:
+ brett.cannon messages:
+ msg180106
|
2013-01-16 18:58:01 | pitrou | set | nosy:
- brett.cannon messages:
+ msg180101
|
2013-01-16 18:57:27 | pitrou | set | nosy:
+ pitrou, brett.cannon messages:
+ msg180100
|
2013-01-16 18:51:04 | brett.cannon | set | nosy:
- brett.cannon
|
2011-05-08 21:23:36 | eltoder | set | messages:
+ msg135542 |
2011-05-08 21:13:12 | eltoder | set | nosy:
+ brett.cannon, ncoghlan, Jeremy.Hylton messages:
+ msg135541
|
2011-05-07 01:07:18 | eltoder | set | messages:
+ msg135395 |
2011-05-06 16:41:51 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg135321
|
2011-05-03 05:17:30 | rhettinger | set | nosy:
+ rhettinger
|
2011-05-03 02:11:21 | eltoder | create | |