classification
Title: range objects becomes hashable after attribute access
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: amaury.forgeotdarc, eric.smith, gpolo, hagen, jcea, ncoghlan, rhettinger
Priority: release blocker Keywords: patch

Created on 2008-12-19 15:21 by hagen, last changed 2009-01-29 11:41 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
remove_casts.patch hagen, 2008-12-19 16:50 remove unnecessary casts in other places
rangehash3.patch hagen, 2008-12-19 18:09
Messages (20)
msg78059 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 15:21
As reported by Dmitry Vasiliev on python-dev, a range object suddenly
becomes hash()able after an attribute access, e.g. by dir().

If I understand correctly, then the reason is that PyRange_Type doesn't
set tp_hash and PyType_Ready is not normally called on the type, but
only e.g. in PyObject_GenericGetAttr.

I don't see any use for range objects being hashable, as they don't even
have meaningful equality defined on them. So I'd recommend making them
unhashable. The attached patch does this and adds a test.
msg78060 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 15:51
You don't need to cast PyObject_HashNotImplemented to hashfunc
msg78061 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:02
Why does every other place seem to do the cast? Historical reasons?
msg78062 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 16:09
On Fri, Dec 19, 2008 at 2:02 PM, Hagen Fürstenau <report@bugs.python.org> wrote:
>
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
>
> Why does every other place seem to do the cast? Historical reasons?
>

No, if you look at the functions being casted you will notice them do
not take a pointer to PyObject as the first argument, if you are
talking about tp_hash specifically.
msg78063 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:14
I'm talking about places like these:

[hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
Objects/*.c Modules/*.c
Objects/dictobject.c:   (hashfunc)PyObject_HashNotImplemented,  /*
tp_hash */
Objects/listobject.c:   (hashfunc)PyObject_HashNotImplemented,  /*
tp_hash */
Objects/setobject.c:    (hashfunc)PyObject_HashNotImplemented,  /*
tp_hash */
Modules/_collectionsmodule.c:   (hashfunc)PyObject_HashNotImplemented, 
/* tp_hash */
msg78064 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 16:27
On Fri, Dec 19, 2008 at 2:14 PM, Hagen Fürstenau <report@bugs.python.org> wrote:
>
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
>
> I'm talking about places like these:
>
> [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
> Objects/*.c Modules/*.c
> Objects/dictobject.c:   (hashfunc)PyObject_HashNotImplemented,  /*
> tp_hash */
> Objects/listobject.c:   (hashfunc)PyObject_HashNotImplemented,  /*
> tp_hash */
> Objects/setobject.c:    (hashfunc)PyObject_HashNotImplemented,  /*
> tp_hash */
> Modules/_collectionsmodule.c:   (hashfunc)PyObject_HashNotImplemented,
> /* tp_hash */

I have checked some of the examples you gave and noticed they were
re-added (or changed) recently, but all these casts could be removed.
msg78065 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:50
Here's an updated patch without the cast and a separate patch for
removing the other casts.
msg78069 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-19 21:24
The origin of the unnecessary hashfunc casts is just me following some
of the more specific examples of filling in the tp_hash slot too closely
without checking if the cast was still needed.

I'll apply and backport Hagen's patches to 3.0 soon (as well as fixing
some other non-hashable types such as slice() to use
PyHash_NotImplemented), but first I want to understand why range()
exhibits this behaviour, while other classes with a superficially
similar implementation (such as bytearray) do not.
msg78071 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-19 22:01
OK, the discrepancy with bytearray turns out to be fairly
straightforward: bytearray overrides the comparison operations, so
inheritance of the default object.__hash__ is automatically blocked.
range() objects don't support comparison, so they inherit __hash__ when
PyType_Ready is called.

Which then begs the question of why range() instances are unhashable
only until something happens to invoke the tp_getattro slot on the type
object... and it turns out that PyType_Ready isn't called on the type
during interpreter startup. Instead, it only happens lazily when one of
the operations that needs the tp_dict to be filled in calls PyType_Ready
(the default dir() retrieves __dict__ from the type object, and this
attribute access causes PyType_Ready to be called).

Only at this point is the slot inheritance on the range() type
calculated correctly.
msg78073 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-12-19 22:43
Patch looks good to me.
msg78086 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 06:51
It has been pointed out to me that xrange() objects are hashable in 2.x,
and since these range objects are immutable, I don't believe there is
any driving reason for them not to be hashable.

At that point the question becomes one of why xrange() is being
initialised correctly in 2.x while something is going wrong with the
tp_hash slot initialisation for range() in 3.x that doesn't get fixed
until PyType_Ready is called to populate tp_dict.
msg78087 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 06:54
Bumping to release blocker for 3.0.1 - until I understand this better,
I'm not sure if the xrange->range hashing behaviour change between 2.x
and 3.x is a type specific problem or a general issue in the
implementation of the new approach to tp_hash inheritance for Py3k.
msg78088 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 07:14
Ah, I think I figured it out - in 2.x, PyObject_Hash itself includes the
fallback to _PyHash_Pointer if none of tp_hash, tp_compare or the
tp_richcompare slots have been implemented on the type.

So long as a type is only trying to inherit object.__hash__ (as is the
case with xrange), then this fallback will do the right thing if
PyType_Ready hasn't been called yet.

In 3.0, on the other hand, PyObject_Hash has no fallback - if tp_hash
isn't filled in, the type isn't considered hashable. This means that for
a type to properly inherit hashability in Py3k, PyType_Ready *must* be
called on it.

Probably the best thing to do is to add xrange and range to the list of
types initialised in _Py_ReadyTypes in 2.x and 3.x respectively.
msg78089 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-20 08:15
> I don't believe there is any driving reason for them not to be hashable.

On the other hand, what is the use case for hashing objects whose
equality is defined as object identity?

Python 3.0 (r30:67503, Dec  4 2008, 06:47:38)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> range(10).__hash__
<method-wrapper '__hash__' of range object at 0x7f2d61dbd210>
>>> {range(10), range(10)}
{range(0, 10), range(0, 10)}

I can see only confusion arising from that.
msg78108 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 14:19
Hagen Fürstenau wrote:
> Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:
> 
>> I don't believe there is any driving reason for them not to be hashable.
> 
> On the other hand, what is the use case for hashing objects whose
> equality is defined as object identity?

Fast membership testing in sets to track which objects you have and
haven't seen, mapping from objects to arbitrary metadata about those
objects without having to explicitly redirect through id(), that kind of
thing. Generally speaking, the idea is that objects should be hashable
if their concept of "equality" cannot change over the life time of the
object. Identity based equality meets that criteria, which is why
objects (including range/xrange) are hashable by default.
msg78401 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-28 12:28
enumerate can be added to the list of builtin types which isn't
initialised correctly, as can the callable+sentinel iterator return from
the 2-argument version of iter() and the default sequence iterator
returned by iter() when given a type with both __len__ and __getitem__
methods, but no __iter__ method.
msg78402 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-28 12:46
Copied from python-dev post:

Perhaps the path of least resistance is to change PyObject_Hash to be
yet another place where PyType_Ready will be called implicitly if it
hasn't been called already?

That approach would get us back to the Python 2.x status quo where
calling PyType_Ready was only absolutely essential if you wanted to
correctly inherit a slot from a type other than object itself.
msg78423 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-12-28 17:33
> Perhaps the path of least resistance is to change PyObject_Hash to be
> yet another place where PyType_Ready will be called implicitly if it
> hasn't been called already?

I think that's the best thing to do. It would bring PyObject_Hash in
line with PyObject_Format, for example.

If we pick some other solution (instead of modifying PyObject_Hash),
then we should find all of the lazy calls to PyType_Ready (that exist to
solve this problem) and remove them. IOW, it should be handled the same
everywhere.
msg78496 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 01:39
Fixed using a lazy call to PyType_Ready in PyObject_Hash:

2.7: r68051
2.6: r68052

Forward-port to Py3k still to come.
msg78511 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 07:41
Forward port to 3.x:

3.1: r68058
3.0: r68060
History
Date User Action Args
2009-01-29 11:41:40jceasetnosy: + jcea
2008-12-30 07:41:30ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg78511
2008-12-30 01:39:50ncoghlansetmessages: + msg78496
2008-12-28 17:33:18eric.smithsetmessages: + msg78423
2008-12-28 12:46:57ncoghlansetmessages: + msg78402
2008-12-28 12:28:28ncoghlansetmessages: + msg78401
2008-12-20 14:19:40ncoghlansetmessages: + msg78108
2008-12-20 08:15:38hagensetmessages: + msg78089
2008-12-20 07:14:38ncoghlansetmessages: + msg78088
2008-12-20 06:54:35ncoghlansetpriority: release blocker
resolution: accepted -> (no value)
messages: + msg78087
versions: - Python 3.1
2008-12-20 06:51:27ncoghlansetmessages: + msg78086
2008-12-20 02:07:58rhettingersetmessages: - msg78080
2008-12-20 02:07:08rhettingersetnosy: + rhettinger
messages: + msg78080
2008-12-19 22:43:16amaury.forgeotdarcsetresolution: accepted
messages: + msg78073
nosy: + amaury.forgeotdarc
2008-12-19 22:01:32ncoghlansetmessages: + msg78071
2008-12-19 21:24:06ncoghlansetassignee: ncoghlan
messages: + msg78069
nosy: + ncoghlan
2008-12-19 18:09:37hagensetfiles: - rangehash2.patch
2008-12-19 18:09:31hagensetfiles: + rangehash3.patch
2008-12-19 18:00:12eric.smithsetnosy: + eric.smith
2008-12-19 16:50:40hagensetfiles: - rangehash.patch
2008-12-19 16:50:36hagensetfiles: + remove_casts.patch
2008-12-19 16:50:09hagensetfiles: + rangehash2.patch
messages: + msg78065
2008-12-19 16:27:17gpolosetmessages: + msg78064
2008-12-19 16:14:09hagensetmessages: + msg78063
2008-12-19 16:09:48gpolosetmessages: + msg78062
2008-12-19 16:02:23hagensetmessages: + msg78061
2008-12-19 15:51:16gpolosetnosy: + gpolo
messages: + msg78060
2008-12-19 15:22:16hagensetfiles: + rangehash.patch
keywords: + patch
2008-12-19 15:21:10hagencreate