Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range objects becomes hashable after attribute access #48951

Closed
hagen mannequin opened this issue Dec 19, 2008 · 20 comments
Closed

range objects becomes hashable after attribute access #48951

hagen mannequin opened this issue Dec 19, 2008 · 20 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error

Comments

@hagen
Copy link
Mannequin

hagen mannequin commented Dec 19, 2008

BPO 4701
Nosy @rhettinger, @jcea, @amauryfa, @ncoghlan, @ericvsmith
Files
  • remove_casts.patch: remove unnecessary casts in other places
  • rangehash3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2008-12-30.07:41:30.710>
    created_at = <Date 2008-12-19.15:21:10.687>
    labels = ['interpreter-core', 'type-bug', 'release-blocker']
    title = 'range objects becomes hashable after attribute access'
    updated_at = <Date 2009-01-29.11:41:40.630>
    user = 'https://bugs.python.org/hagen'

    bugs.python.org fields:

    activity = <Date 2009-01-29.11:41:40.630>
    actor = 'jcea'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2008-12-30.07:41:30.710>
    closer = 'ncoghlan'
    components = ['Interpreter Core']
    creation = <Date 2008-12-19.15:21:10.687>
    creator = 'hagen'
    dependencies = []
    files = ['12402', '12403']
    hgrepos = []
    issue_num = 4701
    keywords = ['patch']
    message_count = 20.0
    messages = ['78059', '78060', '78061', '78062', '78063', '78064', '78065', '78069', '78071', '78073', '78086', '78087', '78088', '78089', '78108', '78401', '78402', '78423', '78496', '78511']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'jcea', 'amaury.forgeotdarc', 'ncoghlan', 'eric.smith', 'gpolo', 'hagen']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4701'
    versions = ['Python 3.0']

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Dec 19, 2008

    As reported by Dmitry Vasiliev on python-dev, a range object suddenly
    becomes hash()able after an attribute access, e.g. by dir().

    If I understand correctly, then the reason is that PyRange_Type doesn't
    set tp_hash and PyType_Ready is not normally called on the type, but
    only e.g. in PyObject_GenericGetAttr.

    I don't see any use for range objects being hashable, as they don't even
    have meaningful equality defined on them. So I'd recommend making them
    unhashable. The attached patch does this and adds a test.

    @hagen hagen mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Dec 19, 2008
    @gpolo
    Copy link
    Mannequin

    gpolo mannequin commented Dec 19, 2008

    You don't need to cast PyObject_HashNotImplemented to hashfunc

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Dec 19, 2008

    Why does every other place seem to do the cast? Historical reasons?

    @gpolo
    Copy link
    Mannequin

    gpolo mannequin commented Dec 19, 2008

    On Fri, Dec 19, 2008 at 2:02 PM, Hagen Fürstenau <report@bugs.python.org> wrote:

    Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:

    Why does every other place seem to do the cast? Historical reasons?

    No, if you look at the functions being casted you will notice them do
    not take a pointer to PyObject as the first argument, if you are
    talking about tp_hash specifically.

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Dec 19, 2008

    I'm talking about places like these:

    [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
    Objects/.c Modules/.c
    Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented,
    /* tp_hash */

    @gpolo
    Copy link
    Mannequin

    gpolo mannequin commented Dec 19, 2008

    On Fri, Dec 19, 2008 at 2:14 PM, Hagen Fürstenau <report@bugs.python.org> wrote:

    Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:

    I'm talking about places like these:

    [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented"
    Objects/.c Modules/.c
    Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /*
    tp_hash */
    Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented,
    /* tp_hash */

    I have checked some of the examples you gave and noticed they were
    re-added (or changed) recently, but all these casts could be removed.

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Dec 19, 2008

    Here's an updated patch without the cast and a separate patch for
    removing the other casts.

    @ncoghlan
    Copy link
    Contributor

    The origin of the unnecessary hashfunc casts is just me following some
    of the more specific examples of filling in the tp_hash slot too closely
    without checking if the cast was still needed.

    I'll apply and backport Hagen's patches to 3.0 soon (as well as fixing
    some other non-hashable types such as slice() to use
    PyHash_NotImplemented), but first I want to understand why range()
    exhibits this behaviour, while other classes with a superficially
    similar implementation (such as bytearray) do not.

    @ncoghlan ncoghlan self-assigned this Dec 19, 2008
    @ncoghlan
    Copy link
    Contributor

    OK, the discrepancy with bytearray turns out to be fairly
    straightforward: bytearray overrides the comparison operations, so
    inheritance of the default object.__hash__ is automatically blocked.
    range() objects don't support comparison, so they inherit __hash__ when
    PyType_Ready is called.

    Which then begs the question of why range() instances are unhashable
    only until something happens to invoke the tp_getattro slot on the type
    object... and it turns out that PyType_Ready isn't called on the type
    during interpreter startup. Instead, it only happens lazily when one of
    the operations that needs the tp_dict to be filled in calls PyType_Ready
    (the default dir() retrieves __dict__ from the type object, and this
    attribute access causes PyType_Ready to be called).

    Only at this point is the slot inheritance on the range() type
    calculated correctly.

    @amauryfa
    Copy link
    Member

    Patch looks good to me.

    @ncoghlan
    Copy link
    Contributor

    It has been pointed out to me that xrange() objects are hashable in 2.x,
    and since these range objects are immutable, I don't believe there is
    any driving reason for them not to be hashable.

    At that point the question becomes one of why xrange() is being
    initialised correctly in 2.x while something is going wrong with the
    tp_hash slot initialisation for range() in 3.x that doesn't get fixed
    until PyType_Ready is called to populate tp_dict.

    @ncoghlan
    Copy link
    Contributor

    Bumping to release blocker for 3.0.1 - until I understand this better,
    I'm not sure if the xrange->range hashing behaviour change between 2.x
    and 3.x is a type specific problem or a general issue in the
    implementation of the new approach to tp_hash inheritance for Py3k.

    @ncoghlan
    Copy link
    Contributor

    Ah, I think I figured it out - in 2.x, PyObject_Hash itself includes the
    fallback to _PyHash_Pointer if none of tp_hash, tp_compare or the
    tp_richcompare slots have been implemented on the type.

    So long as a type is only trying to inherit object.__hash__ (as is the
    case with xrange), then this fallback will do the right thing if
    PyType_Ready hasn't been called yet.

    In 3.0, on the other hand, PyObject_Hash has no fallback - if tp_hash
    isn't filled in, the type isn't considered hashable. This means that for
    a type to properly inherit hashability in Py3k, PyType_Ready *must* be
    called on it.

    Probably the best thing to do is to add xrange and range to the list of
    types initialised in _Py_ReadyTypes in 2.x and 3.x respectively.

    @hagen
    Copy link
    Mannequin Author

    hagen mannequin commented Dec 20, 2008

    I don't believe there is any driving reason for them not to be hashable.

    On the other hand, what is the use case for hashing objects whose
    equality is defined as object identity?

    Python 3.0 (r30:67503, Dec  4 2008, 06:47:38)
    [GCC 4.3.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> range(10).__hash__
    <method-wrapper '__hash__' of range object at 0x7f2d61dbd210>
    >>> {range(10), range(10)}
    {range(0, 10), range(0, 10)}

    I can see only confusion arising from that.

    @ncoghlan
    Copy link
    Contributor

    Hagen Fürstenau wrote:

    Hagen Fürstenau <hfuerstenau@gmx.net> added the comment:

    > I don't believe there is any driving reason for them not to be hashable.

    On the other hand, what is the use case for hashing objects whose
    equality is defined as object identity?

    Fast membership testing in sets to track which objects you have and
    haven't seen, mapping from objects to arbitrary metadata about those
    objects without having to explicitly redirect through id(), that kind of
    thing. Generally speaking, the idea is that objects should be hashable
    if their concept of "equality" cannot change over the life time of the
    object. Identity based equality meets that criteria, which is why
    objects (including range/xrange) are hashable by default.

    @ncoghlan
    Copy link
    Contributor

    enumerate can be added to the list of builtin types which isn't
    initialised correctly, as can the callable+sentinel iterator return from
    the 2-argument version of iter() and the default sequence iterator
    returned by iter() when given a type with both __len__ and __getitem__
    methods, but no __iter__ method.

    @ncoghlan
    Copy link
    Contributor

    Copied from python-dev post:

    Perhaps the path of least resistance is to change PyObject_Hash to be
    yet another place where PyType_Ready will be called implicitly if it
    hasn't been called already?

    That approach would get us back to the Python 2.x status quo where
    calling PyType_Ready was only absolutely essential if you wanted to
    correctly inherit a slot from a type other than object itself.

    @ericvsmith
    Copy link
    Member

    Perhaps the path of least resistance is to change PyObject_Hash to be
    yet another place where PyType_Ready will be called implicitly if it
    hasn't been called already?

    I think that's the best thing to do. It would bring PyObject_Hash in
    line with PyObject_Format, for example.

    If we pick some other solution (instead of modifying PyObject_Hash),
    then we should find all of the lazy calls to PyType_Ready (that exist to
    solve this problem) and remove them. IOW, it should be handled the same
    everywhere.

    @ncoghlan
    Copy link
    Contributor

    Fixed using a lazy call to PyType_Ready in PyObject_Hash:

    2.7: r68051
    2.6: r68052

    Forward-port to Py3k still to come.

    @ncoghlan
    Copy link
    Contributor

    Forward port to 3.x:

    3.1: r68058
    3.0: r68060

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants