msg78059 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2008-12-19 15:21 |
As reported by Dmitry Vasiliev on python-dev, a range object suddenly
becomes hash()able after an attribute access, e.g. by dir().
If I understand correctly, then the reason is that PyRange_Type doesn't
set tp_hash and PyType_Ready is not normally called on the type, but
only e.g. in PyObject_GenericGetAttr.
I don't see any use for range objects being hashable, as they don't even
have meaningful equality defined on them. So I'd recommend making them
unhashable. The attached patch does this and adds a test.
|
msg78060 - (view) |
Author: Guilherme Polo (gpolo) * |
Date: 2008-12-19 15:51 |
You don't need to cast PyObject_HashNotImplemented to hashfunc
|
msg78061 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2008-12-19 16:02 |
Why does every other place seem to do the cast? Historical reasons?
|
msg78062 - (view) |
Author: Guilherme Polo (gpolo) * |
Date: 2008-12-19 16:09 |
On Fri, Dec 19, 2008 at 2:02 PM, Hagen Fürstenau <report@bugs.python.org> wrote:
>
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
>
> Why does every other place seem to do the cast? Historical reasons?
>
No, if you look at the functions being casted you will notice them do
not take a pointer to PyObject as the first argument, if you are
talking about tp_hash specifically.
|
msg78063 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2008-12-19 16:14 |
I'm talking about places like these:
[hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
Objects/*.c Modules/*.c
Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /*
tp_hash */
Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /*
tp_hash */
Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /*
tp_hash */
Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented,
/* tp_hash */
|
msg78064 - (view) |
Author: Guilherme Polo (gpolo) * |
Date: 2008-12-19 16:27 |
On Fri, Dec 19, 2008 at 2:14 PM, Hagen Fürstenau <report@bugs.python.org> wrote:
>
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
>
> I'm talking about places like these:
>
> [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
> Objects/*.c Modules/*.c
> Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /*
> tp_hash */
> Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /*
> tp_hash */
> Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /*
> tp_hash */
> Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented,
> /* tp_hash */
I have checked some of the examples you gave and noticed they were
re-added (or changed) recently, but all these casts could be removed.
|
msg78065 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2008-12-19 16:50 |
Here's an updated patch without the cast and a separate patch for
removing the other casts.
|
msg78069 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-19 21:24 |
The origin of the unnecessary hashfunc casts is just me following some
of the more specific examples of filling in the tp_hash slot too closely
without checking if the cast was still needed.
I'll apply and backport Hagen's patches to 3.0 soon (as well as fixing
some other non-hashable types such as slice() to use
PyHash_NotImplemented), but first I want to understand why range()
exhibits this behaviour, while other classes with a superficially
similar implementation (such as bytearray) do not.
|
msg78071 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-19 22:01 |
OK, the discrepancy with bytearray turns out to be fairly
straightforward: bytearray overrides the comparison operations, so
inheritance of the default object.__hash__ is automatically blocked.
range() objects don't support comparison, so they inherit __hash__ when
PyType_Ready is called.
Which then begs the question of why range() instances are unhashable
only until something happens to invoke the tp_getattro slot on the type
object... and it turns out that PyType_Ready isn't called on the type
during interpreter startup. Instead, it only happens lazily when one of
the operations that needs the tp_dict to be filled in calls PyType_Ready
(the default dir() retrieves __dict__ from the type object, and this
attribute access causes PyType_Ready to be called).
Only at this point is the slot inheritance on the range() type
calculated correctly.
|
msg78073 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-12-19 22:43 |
Patch looks good to me.
|
msg78086 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-20 06:51 |
It has been pointed out to me that xrange() objects are hashable in 2.x,
and since these range objects are immutable, I don't believe there is
any driving reason for them not to be hashable.
At that point the question becomes one of why xrange() is being
initialised correctly in 2.x while something is going wrong with the
tp_hash slot initialisation for range() in 3.x that doesn't get fixed
until PyType_Ready is called to populate tp_dict.
|
msg78087 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-20 06:54 |
Bumping to release blocker for 3.0.1 - until I understand this better,
I'm not sure if the xrange->range hashing behaviour change between 2.x
and 3.x is a type specific problem or a general issue in the
implementation of the new approach to tp_hash inheritance for Py3k.
|
msg78088 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-20 07:14 |
Ah, I think I figured it out - in 2.x, PyObject_Hash itself includes the
fallback to _PyHash_Pointer if none of tp_hash, tp_compare or the
tp_richcompare slots have been implemented on the type.
So long as a type is only trying to inherit object.__hash__ (as is the
case with xrange), then this fallback will do the right thing if
PyType_Ready hasn't been called yet.
In 3.0, on the other hand, PyObject_Hash has no fallback - if tp_hash
isn't filled in, the type isn't considered hashable. This means that for
a type to properly inherit hashability in Py3k, PyType_Ready *must* be
called on it.
Probably the best thing to do is to add xrange and range to the list of
types initialised in _Py_ReadyTypes in 2.x and 3.x respectively.
|
msg78089 - (view) |
Author: Hagen Fürstenau (hagen) |
Date: 2008-12-20 08:15 |
> I don't believe there is any driving reason for them not to be hashable.
On the other hand, what is the use case for hashing objects whose
equality is defined as object identity?
Python 3.0 (r30:67503, Dec 4 2008, 06:47:38)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> range(10).__hash__
<method-wrapper '__hash__' of range object at 0x7f2d61dbd210>
>>> {range(10), range(10)}
{range(0, 10), range(0, 10)}
I can see only confusion arising from that.
|
msg78108 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-20 14:19 |
Hagen Fürstenau wrote:
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
>
>> I don't believe there is any driving reason for them not to be hashable.
>
> On the other hand, what is the use case for hashing objects whose
> equality is defined as object identity?
Fast membership testing in sets to track which objects you have and
haven't seen, mapping from objects to arbitrary metadata about those
objects without having to explicitly redirect through id(), that kind of
thing. Generally speaking, the idea is that objects should be hashable
if their concept of "equality" cannot change over the life time of the
object. Identity based equality meets that criteria, which is why
objects (including range/xrange) are hashable by default.
|
msg78401 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-28 12:28 |
enumerate can be added to the list of builtin types which isn't
initialised correctly, as can the callable+sentinel iterator return from
the 2-argument version of iter() and the default sequence iterator
returned by iter() when given a type with both __len__ and __getitem__
methods, but no __iter__ method.
|
msg78402 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-28 12:46 |
Copied from python-dev post:
Perhaps the path of least resistance is to change PyObject_Hash to be
yet another place where PyType_Ready will be called implicitly if it
hasn't been called already?
That approach would get us back to the Python 2.x status quo where
calling PyType_Ready was only absolutely essential if you wanted to
correctly inherit a slot from a type other than object itself.
|
msg78423 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2008-12-28 17:33 |
> Perhaps the path of least resistance is to change PyObject_Hash to be
> yet another place where PyType_Ready will be called implicitly if it
> hasn't been called already?
I think that's the best thing to do. It would bring PyObject_Hash in
line with PyObject_Format, for example.
If we pick some other solution (instead of modifying PyObject_Hash),
then we should find all of the lazy calls to PyType_Ready (that exist to
solve this problem) and remove them. IOW, it should be handled the same
everywhere.
|
msg78496 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-30 01:39 |
Fixed using a lazy call to PyType_Ready in PyObject_Hash:
2.7: r68051
2.6: r68052
Forward-port to Py3k still to come.
|
msg78511 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2008-12-30 07:41 |
Forward port to 3.x:
3.1: r68058
3.0: r68060
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:43 | admin | set | github: 48951 |
2009-01-29 11:41:40 | jcea | set | nosy:
+ jcea |
2008-12-30 07:41:30 | ncoghlan | set | status: open -> closed resolution: fixed messages:
+ msg78511 |
2008-12-30 01:39:50 | ncoghlan | set | messages:
+ msg78496 |
2008-12-28 17:33:18 | eric.smith | set | messages:
+ msg78423 |
2008-12-28 12:46:57 | ncoghlan | set | messages:
+ msg78402 |
2008-12-28 12:28:28 | ncoghlan | set | messages:
+ msg78401 |
2008-12-20 14:19:40 | ncoghlan | set | messages:
+ msg78108 |
2008-12-20 08:15:38 | hagen | set | messages:
+ msg78089 |
2008-12-20 07:14:38 | ncoghlan | set | messages:
+ msg78088 |
2008-12-20 06:54:35 | ncoghlan | set | priority: release blocker resolution: accepted -> (no value) messages:
+ msg78087 versions:
- Python 3.1 |
2008-12-20 06:51:27 | ncoghlan | set | messages:
+ msg78086 |
2008-12-20 02:07:58 | rhettinger | set | messages:
- msg78080 |
2008-12-20 02:07:08 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg78080 |
2008-12-19 22:43:16 | amaury.forgeotdarc | set | resolution: accepted messages:
+ msg78073 nosy:
+ amaury.forgeotdarc |
2008-12-19 22:01:32 | ncoghlan | set | messages:
+ msg78071 |
2008-12-19 21:24:06 | ncoghlan | set | assignee: ncoghlan messages:
+ msg78069 nosy:
+ ncoghlan |
2008-12-19 18:09:37 | hagen | set | files:
- rangehash2.patch |
2008-12-19 18:09:31 | hagen | set | files:
+ rangehash3.patch |
2008-12-19 18:00:12 | eric.smith | set | nosy:
+ eric.smith |
2008-12-19 16:50:40 | hagen | set | files:
- rangehash.patch |
2008-12-19 16:50:36 | hagen | set | files:
+ remove_casts.patch |
2008-12-19 16:50:09 | hagen | set | files:
+ rangehash2.patch messages:
+ msg78065 |
2008-12-19 16:27:17 | gpolo | set | messages:
+ msg78064 |
2008-12-19 16:14:09 | hagen | set | messages:
+ msg78063 |
2008-12-19 16:09:48 | gpolo | set | messages:
+ msg78062 |
2008-12-19 16:02:23 | hagen | set | messages:
+ msg78061 |
2008-12-19 15:51:16 | gpolo | set | nosy:
+ gpolo messages:
+ msg78060 |
2008-12-19 15:22:16 | hagen | set | files:
+ rangehash.patch keywords:
+ patch |
2008-12-19 15:21:10 | hagen | create | |