Make hash values the same width as a pointer (or Py_ssize_t) #53987

pitrou · 2010-09-04T22:11:30Z

BPO	9778
Nosy	@malemburg, @tim-one, @loewis, @smontanaro, @birkenfeld, @rhettinger, @mdickinson, @abalkin, @pitrou, @benjaminp, @skrah
Files	hash_t.diff hash_t.diff deunsignify.diff py_ssize_t_hash_patch.diff Py_uhash_t_patch.diff hashw64.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-10-23.19:09:16.110>
created_at = <Date 2010-09-04.22:11:29.637>
labels = ['interpreter-core', 'type-feature']
title = 'Make hash values the same width as a pointer (or Py_ssize_t)'
updated_at = <Date 2010-10-23.19:09:16.109>
user = 'https://github.com/pitrou'

bugs.python.org fields:

activity = <Date 2010-10-23.19:09:16.109>
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = <Date 2010-10-23.19:09:16.110>
closer = 'loewis'
components = ['Interpreter Core']
creation = <Date 2010-09-04.22:11:29.637>
creator = 'pitrou'
dependencies = []
files = ['19257', '19258', '19260', '19317', '19338', '19343']
hgrepos = []
issue_num = 9778
keywords = ['patch']
message_count = 52.0
messages = ['115617', '115618', '115619', '115626', '115639', '115656', '115657', '115695', '118789', '118820', '118823', '118953', '118954', '118955', '118959', '118968', '118972', '118980', '118982', '118983', '118987', '118994', '118995', '119000', '119001', '119011', '119015', '119017', '119019', '119020', '119025', '119029', '119030', '119034', '119043', '119044', '119058', '119077', '119079', '119080', '119124', '119265', '119411', '119415', '119442', '119443', '119446', '119453', '119456', '119464', '119465', '119468']
nosy_count = 14.0
nosy_names = ['lemburg', 'tim.peters', 'loewis', 'skip.montanaro', 'georg.brandl', 'rhettinger', 'jimjjewett', 'mark.dickinson', 'belopolsky', 'pitrou', 'casevh', 'ked-tao', 'benjamin.peterson', 'skrah']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue9778'
versions = ['Python 3.2', 'Python 3.3']

pitrou · 2010-09-04T22:11:29Z

Currently, Python produces hash values with fit in a C "long". This is fine at first sight, but in the context of dict and set implementations, it means that

holding hashes and indices in the same field of a structure requires some care (see bpo-1646068)
on platforms where a long is smaller than a Py_ssize_t (e.g. Win64), very big hash tables could suffer from lots of artificial collisions (the hash table being bigger than the range of possible hash values)
when a long is smaller than Py_ssize_t, we don't save any size anyway, since having some pointers follow a C "long" in a structure implies some padding to keep all fields naturally aligned

A future-proof option would be to change all hash values to be of Py_ssize_t values rather than C longs. Either directly, or by defining a new dedicated alias Py_hash_t. This would also impact the ABI, I suppose.

pitrou · 2010-09-04T22:12:40Z

As an example of padding behaviour (under Win64 with 32-bit longs and 64-bit pointers):

Python 3.2a1+ (py3k, Sep 4 2010, 22:50:10) [MSC v.1500 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.calcsize("l")
4
>>> struct.calcsize("lP")
16
>>> struct.calcsize("lPP")
24

rhettinger · 2010-09-04T22:14:40Z

+1 for PyObject_Hash returning a Py_ssize_t.

Nothing good can come from having a hash table larger than the range of a hash value. Tests may pass but performance would degrade catastrophically.

loewis · 2010-09-05T06:46:08Z

This would also impact the ABI, I suppose.

Correct. So it either needs to happen before 3.2, or wait until 4.0,
or the introduction of "wide" hashes needs to be done in a compatible
manner, likely requiring two parallel hashing infrastructures.

pitrou · 2010-09-05T11:12:53Z

Correct. So it either needs to happen before 3.2, or wait until 4.0,

Shouldn't there be a provision for ABI versioning?
Or do you suggest bumping to the next major number (4.0, 5.0...) be done on the basis of ABI changes?

loewis · 2010-09-05T16:59:07Z

Am 05.09.2010 13:12, schrieb Antoine Pitrou:

Antoine Pitrou <pitrou@free.fr> added the comment:

> Correct. So it either needs to happen before 3.2, or wait until 4.0,

Shouldn't there be a provision for ABI versioning?

There is certainly support for ABI versioning; the ABI version defined
in the PEP is called "python3.dll".

Or do you suggest bumping to the next major
number (4.0, 5.0...) be done on the basis of ABI changes?

No, vice versa. The PEP promises that the ABI won't change until Python
4. For any change that might break the ABI, either a
backwards-compatible solution needs to be found, or the change be
deferred to Python 4.

birkenfeld · 2010-09-05T17:03:21Z

I would suggest making this change before the ABI freeze starts, then.

mdickinson · 2010-09-06T10:15:23Z

... change all hash values to be of Py_ssize_t values rather than C longs ...

Yes, please! (Provided this change can go in before 3.2.) For the numeric types at least, this should be a straightforward adjustment.

And while we're at it, we could clean up some of the undefined behaviour from integer overflow that's in the current hash functions (tuple.__hash__, for example).

birkenfeld · 2010-10-15T16:04:20Z

Ping?

smontanaro · 2010-10-15T18:58:41Z

Not that anybody needs my input on this, but...

Given the range of people advocating for this change, this looks to me
like it should be a release blocker for 3.2. Raymond's comment about
performance seems especially important, and since the world seems to
be moving toward 64-bit operating systems (certainly should happen in
a big way during the lifetime of Python 3) it seems worthwhile to hold
up further 3.2 releases until this is solved.

pitrou · 2010-10-15T19:12:26Z

Given the range of people advocating for this change, this looks to me
like it should be a release blocker for 3.2. Raymond's comment about
performance seems especially important, and since the world seems to
be moving toward 64-bit operating systems (certainly should happen in
a big way during the lifetime of Python 3) it seems worthwhile to hold
up further 3.2 releases until this is solved.

I think this is a bit exagerated. The performance issues will only
appear if you have huge dicts and sets.
The issue Raymond raised is the potential impossibility of making the
change /after/ we settle on a stable ABI. The question is whether the
ABI will be enforced starting from 3.2, or from a later date.

loewis · 2010-10-17T17:40:18Z

The issue Raymond raised is the potential impossibility of making the
change /after/ we settle on a stable ABI. The question is whether the
ABI will be enforced starting from 3.2, or from a later date.

I'd like to repeat that it will not be impossible to fix this after the
ABI is in place.

pitrou · 2010-10-17T17:50:43Z

Le dimanche 17 octobre 2010 à 17:40 +0000, Martin v. Löwis a écrit :

Martin v. Löwis <martin@v.loewis.de> added the comment:

> The issue Raymond raised is the potential impossibility of making the
> change /after/ we settle on a stable ABI. The question is whether the
> ABI will be enforced starting from 3.2, or from a later date.

I'd like to repeat that it will not be impossible to fix this after the
ABI is in place.

At the cost of very annoying complications, though.

birkenfeld · 2010-10-17T17:53:01Z

Can't we just do it now, and be done with it regardless of the stable ABI?

loewis · 2010-10-17T18:07:12Z

Can't we just do it now, and be done with it regardless of the stable ABI?

Sure. Somebody needs to implement it (and consider what consequences
this has on third-party modules - I'm uncertain).

casevh · 2010-10-17T19:35:36Z

I'm the maintainer of a third-party library (gmpy) that would be impacted by this and I'm definately in favor of this change. With ever increasing amounts of memory becoming standard in computers, more users will encounter performance issues with large dictionaries. I see 3.2 as the version of Python 3.x that gains wide-spread use. Regardless of the timing of a stable ABI, it's popularity will make it more difficult change the hash size in the future.

loewis · 2010-10-17T19:51:54Z

I'm the maintainer of a third-party library (gmpy) that would be
impacted by this and I'm definitely in favor of this change.

Assume Python would make such a change, and users would build released
gmpy versions for such a Python release. What would happen (in terms
of observable errors)?

benjaminp · 2010-10-17T20:25:30Z

Here's a patch. Please review.

loewis · 2010-10-17T20:42:02Z

A few comments:

in complex_hash, why did you drop the cast?
in object.rst, add versionchanged. It should also explain that Py_hash_t is a signed integer with the same width as size_t.

Otherwise, it looks fine.

benjaminp · 2010-10-17T20:55:12Z

Dropping the cast was inadvertent. I've now added versionchanged and committed it in r85664.

smontanaro · 2010-10-17T22:23:49Z

I added a placeholder to the What's New document. Since it will affect extension module authors it should be mentioned at a higher level than just Misc/NEWS.

casevh · 2010-10-18T02:43:26Z

The patch does not address that "unsigned long" is still used to calculate the hash values. This breaks numeric hashing and leads to incorrect values.

Python 3.2a3+ (py3k, Oct 17 2010, 19:03:38) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=-1, inf=314159, nan=0, imag=1000003)
>>>

Would "size_t" be the appropriate type to use?

benjaminp · 2010-10-18T03:22:36Z

Good point. Here's another patch:

malemburg · 2010-10-18T10:18:44Z

Wouldn't it be better to use Py_hash_t instead of size_t for calculating the hash values in those hash functions ?

malemburg · 2010-10-18T10:21:01Z

And related to my previous comment: shouldn't Py_hash_t map to size_t instead of Py_ssize_t ?

casevh · 2010-10-18T13:39:04Z

Some quick comments on the latest patch.

I don't think you can remove the type cast used when comparing the hash value against -1 and -2. IIRC, GCC considers that undefined behavior.
In sysmodule.c, we need to use PyLong_FromSsize_t when storing _PyHASH_MODULUS.
In pyport.h, we need a better way to define _PyHASH_MODULUS. The existing "((1UL << _PyHASH_BITS) - 1)" fails when unsigned long is smaller than Py_ssize_t. "((1ULL << _PyHASH_BITS) - 1)" works on 64-bit Windows but is not a good solution for systems that don't have an unsigned long long type. I haven't thought of a better solution for one.

abalkin · 2010-10-18T13:56:02Z

On Mon, Oct 18, 2010 at 8:45 AM, Benjamin Peterson
<report@bugs.python.org> wrote:
..

No, negative values have to be allowed.

Why? As far as I can tell, negative values are only used as sentinels
and we can use say (size_t)-1 instead of -1L. Are there cases where
hash values' ordering is important?

pitrou · 2010-10-18T13:59:44Z

Le lundi 18 octobre 2010 à 13:56 +0000, Alexander Belopolsky a écrit :

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

On Mon, Oct 18, 2010 at 8:45 AM, Benjamin Peterson
<report@bugs.python.org> wrote:
..
> No, negative values have to be allowed.
>

Why? As far as I can tell, negative values are only used as sentinels
and we can use say (size_t)-1 instead of -1L.

You can, except that changing the sentinel value will probably break a
lot of code out there (and there's no benefit AFAICT).

abalkin · 2010-10-18T14:09:11Z

On Mon, Oct 18, 2010 at 9:59 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..

> Why? As far as I can tell, negative values are only used as sentinels
> and we can use say (size_t)-1 instead of -1L.

You can, except that changing the sentinel value will probably break a
lot of code out there (and there's no benefit AFAICT).

AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
break code than a change from -1L to (Py_ssize_t)-1. (Assuming a
sizeof(long) != sizeof(void*) platform.) The benefit, though is that
hash computations can be performed natively on the hash values without
casting to an unrelated type. If Py_hash_t needs to be signed, I
would like to see a typedef Py_uhash_t size_t next to that for
Py_hash_t and Py_uhash_t used in hash computations rather than
explicit size_t.

malemburg · 2010-10-18T14:47:35Z

Benjamin Peterson wrote:

Benjamin Peterson <benjamin@python.org> added the comment:
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>
> And related to my previous comment: shouldn't Py_hash_t map to size_t instead of Py_ssize_t ?

No, negative values have to be allowed.

Ah, right... we use -1 as error indicator.

casevh · 2010-10-18T15:19:20Z

Also, note that hash(-12) is -12.

pitrou · 2010-10-18T15:27:27Z

AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
break code than a change from -1L to (Py_ssize_t)-1. (Assuming a
sizeof(long) != sizeof(void*) platform.)

That's true.

The benefit, though is that
hash computations can be performed natively on the hash values without
casting to an unrelated type.

I don't understand what you mean by "native" and "unrelated". Signed
integers are not less native than unsigned ones.

abalkin · 2010-10-18T16:09:17Z

On Mon, Oct 18, 2010 at 11:27 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..

> The benefit, though is that
> hash computations can be performed natively on the hash values without
> casting to an unrelated type.

I don't understand what you mean by "native" and "unrelated". Signed
integers are not less native than unsigned ones.

Sorry, I could have been clearer indeed. Consider the following code:

static Py_hash_t
long_hash(PyLongObject *v)
{
    unsigned long x;
...
    x = x * sign;
    if (x == (unsigned long)-1)
        x = (unsigned long)-2;
    return (Py_hash_t)x;
}

Wouldn't it be cleaner if x was the same type as hash? Note that
unsigned long is now wrong. What is needed is "unsigned integer type
of the same size as Py_hash_t." If Py_hash_t has to stay signed, I
think we should at least not rely of sizeof(Py_hash_t) to always
remain the same as sizeof(size_t).

casevh · 2010-10-18T17:25:29Z

Sorry, I could have been clearer indeed. Consider the following code:

static Py_hash_t
long_hash(PyLongObject *v)
{
unsigned long x;
...
x = x * sign;
if (x == (unsigned long)-1)
x = (unsigned long)-2;
return (Py_hash_t)x;
}

Wouldn't it be cleaner if x was the same type as hash? Note that
unsigned long is now wrong. What is needed is "unsigned integer type
of the same size as Py_hash_t." If Py_hash_t has to stay signed, I
think we should at least not rely of sizeof(Py_hash_t) to always
remain the same as sizeof(size_t).

The calculation of long_hash assumes an unsigned temporary type to get correct results for the bit shifting and masking. The calculation is done on the absolute value of the long and then the sign is applied. We either needed to (1) add an unsigned Py_hash_t type or (2) just use size_t and Py_ssize_t.

abalkin · 2010-10-18T17:31:49Z

On Mon, Oct 18, 2010 at 1:25 PM, Case Van Horsen <report@bugs.python.org> wrote:
..

We either needed to (1) add an unsigned Py_hash_t type or (2) just use size_t and Py_ssize_t.

Option (2) may actually be preferable because dict and set
implementations rely on Py_hash_t being compatible with Py_ssize_t.
See bpo-1646068.

rhettinger · 2010-10-18T18:19:02Z

My expectation is that Py_hash_t is the same as Py_ssize_t. The goal is to make hashes match the range of possible table sizes.

loewis · 2010-10-18T20:44:26Z

Am 18.10.2010 17:27, schrieb Antoine Pitrou:

Antoine Pitrou <pitrou@free.fr> added the comment:

> AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
> break code than a change from -1L to (Py_ssize_t)-1. (Assuming a
> sizeof(long) != sizeof(void*) platform.)

That's true.

I don't think it is. Code that is changed to use the new return type
will not break in a change from long to Py_ssize_t, since all values
will sign-expand correctly, in all cases; the same is true if the new
return type was size_t. So for code that gets adjusted in its return
value, no breakage in either case.

For code that *doesn't* get adjusted, I'm still uncertain what will
happen. However, ISTM that if there is a cast to the "correct" function
pointer type, you get an incorrect detection of the guard value in
either case: e.g. on AMD64, the return value is in RAX, yet the hash
function may only fill out EAX. I'm not sure what values the upper
32 bits will have, but I doubt it gets sign-expanded - which it should
regardless of whether the expected value is (size_t)-1 or
(Py_ssize_t)-1. So you get breakage in either case.

Please correct me if I'm wrong.

loewis · 2010-10-18T20:46:25Z

Wouldn't it be cleaner if x was the same type as hash? Note that
unsigned long is now wrong. What is needed is "unsigned integer type
of the same size as Py_hash_t." If Py_hash_t has to stay signed, I
think we should at least not rely of sizeof(Py_hash_t) to always
remain the same as sizeof(size_t).

But this is an absolute requirement, a guarantee that we provide
forever, and the whole point of this patch. Py_hash_t *will* be
a signed version of size_t, just as Py_ssize_t. Not by chance, but
by careful, inherent design.

loewis · 2010-10-18T20:48:42Z

My expectation is that Py_hash_t is the same as Py_ssize_t. The goal
is to make hashes match the range of possible table sizes.

Please read the patch:

typedef Py_ssize_t Py_hash_t;

mdickinson · 2010-10-19T07:15:04Z

The calculation of long_hash assumes an unsigned temporary type to get
correct results for the bit shifting and masking.

Yes, exactly.

The calculation is
done on the absolute value of the long and then the sign is applied. We
either needed to (1) add an unsigned Py_hash_t type or (2) just use
size_t and Py_ssize_t.

I like (2); the use of Py_hash_t suggests to me that the type used for the hash is configurable independently of Py_ssize_t, which isn't true.

Also, with Py_hash_t it's no longer clear that printing a hash value (e.g., using PyErr_Format and friends) should use the '%zd' modifier.

casevh · 2010-10-21T07:10:07Z

I've attached a patch that fixes hashing for numerical types, sys.hash_info is now correct, fixes typeobject.c/wrap_hashfunc and tupleobject.c/tuplehash to use Py_ssize_t instead of long, and uses Py_ssize_t instead of Py_hash_t.

I think it is clearer to use Py_ssize_t instead of Py_hash_t. I found two occurances where PyLong_FromLong needed to be replaced by PyLong_FromSsize_t and I think bugs like that would be easier to catch if Py_ssize_t is used.

casevh · 2010-10-23T02:09:58Z

I maintain gmpy and it needs to calculate hash values for integers, floats, and rationals. I converted my hash calculations to use Py_ssize_t in a 64-bit Windows enviroment. All my tests pass when I build Python with my previous patch.

In hindsight, I think I made a mistake in my previous patch by eliminating Py_hash_t and using Py_ssize_t/size_t. I ended up defining Py_hash_t and Py_uhash_t in gmpy to simplify the code and to easily support older versions of Python.

I will work on a patch that defines Py_hash_t and Py_uhash_t and upload it later this evening.

casevh · 2010-10-23T03:35:35Z

I've uploaded a patch against the current svn trunk that:

Defines a Py_uhash_t as equivalent to size_t.
Correctly defines _PyHASH_MODULUS on Win64.
Replaces several PyLong_FromLong with PyLong_FromSsize_t.
Change typeobject/wrap_hashfunc to use Py_hash_t instead of long.
Change tupleobject/tuplehash to use Py_hast_t instead of long.
Change long/double/complex hash functions to use Py_uhash_t instead of unsigned long.

benjaminp · 2010-10-23T16:21:01Z

Thank you. Applied in r85803.

pitrou · 2010-10-23T16:23:31Z

Case's patch fixes test_builtin and test_complex failures on Windows 7 64-bit. But there's still a failure in test_dictviews:

======================================================================
FAIL: test_items_set_operations (test.test_dictviews.DictSetTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Z:\__svn__\lib\test\test_dictviews.py", line 153, in test_items_set_operations
    {('a', 1), ('b', 2)})
AssertionError: Items in the first set but not the second:
('a', 1)
('b', 2)

Which boils down to the following issue:

>>> d1 = {'a': 1, 'b': 2}
>>> d1.items()
dict_items([('a', 1), ('b', 2)])
>>> set(d1.items())
{('a', 1), ('b', 2)}
>>> d1.items() | set(d1.items())
{('a', 1), ('a', 1), ('b', 2), ('b', 2)}

There are also a bunch of possibly related failures in test_weakset:

======================================================================
FAIL: test_inplace_on_self (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 293, in test_inplace_on_
self
    self.assertEqual(t, self.s)
AssertionError: <_weakrefset.WeakSet object at 0x000000000283CDC0> != <_weakrefs
et.WeakSet object at 0x000000000283B2C8>

======================================================================
FAIL: test_or (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 72, in test_or
    self.assertEqual(self.s | set(self.items2), i)
AssertionError: <_weakrefset.WeakSet object at 0x000000000285B400> != <_weakrefs
et.WeakSet object at 0x000000000285B260>

======================================================================
FAIL: test_symmetric_difference (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 110, in test_symmetric_d
ifference
    self.assertEqual(c in i, (c in self.d) ^ (c in self.items2))
AssertionError: False != True

======================================================================
FAIL: test_union (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 61, in test_union
    self.assertEqual(c in u, c in self.d or c in self.items2)
AssertionError: False != True

======================================================================
FAIL: test_xor (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 117, in test_xor
    self.assertEqual(self.s ^ set(self.items2), i)
AssertionError: <_weakrefset.WeakSet object at 0x000000000292CA18> != <_weakrefs
et.WeakSet object at 0x000000000292C878>

Another bunch of test_pyclbr failures:

======================================================================
FAIL: test_decorators (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 152, in test_decorators
    self.checkModule('test.pyclbr_input', ignore=['om'])
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'object' missing

======================================================================
FAIL: test_easy (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 142, in test_easy
    self.checkModule('pyclbr')
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'object' missing

======================================================================
FAIL: test_others (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 158, in test_others
    cm('random', ignore=('Random',))  # from _random import Random as CoreGenera
tor
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'Random' missing

And a test_sys failure which is probably easy to fix:

File "Z:\svn\lib\test\test_sys.py", line 831, in test_objecttypes
check(s, basicsize)
File "Z:\svn\lib\test\test_sys.py", line 601, in check_sizeof
self.assertEqual(result, size, msg)
AssertionError: wrong size for <class 'str'>: got 74, expected 66

All these tests pass in 32-bit mode.
Case, you can run the regression test suite with:
python.exe -m test.regrtest
and invidual tests with e.g.:
python.exe -m test.regrtest -v test_dictviews

pitrou · 2010-10-23T16:40:34Z

This patch seems to fix all aforementioned failures.

loewis · 2010-10-23T17:34:11Z

These changes also look all reasonable to me.

pitrou · 2010-10-23T17:38:54Z

Ok, I've committed them in r85808.

casevh · 2010-10-23T18:41:37Z

On Win64, I get two unexpected failures:

I terminated test_capi after a Windows exception box popped up.

[ 37/349] test_capi
test test_capi failed -- Traceback (most recent call last):
  File "C:\svn\py3k\lib\test\test_capi.py", line 50, in test_no_FatalError_infinite_loop
    b'Fatal Python error:'
AssertionError: b"Fatal Python error: PyThreadState_Get: no current thread\r\n\r\nThis application has requested the Runtime to term
inate it in an unusual way.\nPlease contact the application's support team for more information." != b'Fatal Python error: PyThreadS
tate_Get: no current thread'

[ 67/349] test_concurrent_futures
test test_concurrent_futures failed -- Traceback (most recent call last):
  File "C:\svn\py3k\lib\test\test_concurrent_futures.py", line 442, in test_timeout
    future1]), finished)
AssertionError: Items in the second set but not the first:
<Future at 0x5510fd0 state=finished raised AssertionError>

The dictviews test passes successfully.

pitrou · 2010-10-23T18:52:34Z

The test_capi problem is not 64-bit-specific (see bpo-9116).

As for test_concurrent_futures, I also have a failure (not the same one) in both 32-bit and 64-bit builds here. I'm gonna open a separate issue.

loewis · 2010-10-23T19:09:16Z

I recommend to declare this issue closed, and keep it closed unless somebody wants to propose to revert the change (widening the hash type) completely. Any remaining issues that people want to attribute to this change (correctly or incorrectly) should be reported as separate issues.

pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 4, 2010

loewis mannequin changed the title ~~Make hash values the same width as a pointer (or Py_ssize_t)~~ Make hash values the same width as a pointer (or Py_ssize_t) Sep 5, 2010

benjaminp closed this as completed Oct 17, 2010

pitrou reopened this Oct 23, 2010

loewis mannequin closed this as completed Oct 23, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner mentioned this issue Nov 15, 2023

gh-111545: Add PyHash_Double() function #112095

Closed

Make hash values the same width as a pointer (or Py_ssize_t) #53987

Make hash values the same width as a pointer (or Py_ssize_t) #53987

Comments

pitrou commented Sep 4, 2010

pitrou commented Sep 4, 2010

pitrou commented Sep 4, 2010

rhettinger commented Sep 4, 2010

loewis mannequin commented Sep 5, 2010

pitrou commented Sep 5, 2010

loewis mannequin commented Sep 5, 2010

birkenfeld commented Sep 5, 2010

mdickinson commented Sep 6, 2010

birkenfeld commented Oct 15, 2010

smontanaro commented Oct 15, 2010

pitrou commented Oct 15, 2010

loewis mannequin commented Oct 17, 2010

pitrou commented Oct 17, 2010

birkenfeld commented Oct 17, 2010

loewis mannequin commented Oct 17, 2010

casevh mannequin commented Oct 17, 2010

loewis mannequin commented Oct 17, 2010

benjaminp commented Oct 17, 2010

loewis mannequin commented Oct 17, 2010

benjaminp commented Oct 17, 2010

smontanaro commented Oct 17, 2010

casevh mannequin commented Oct 18, 2010

benjaminp commented Oct 18, 2010

malemburg commented Oct 18, 2010

malemburg commented Oct 18, 2010

casevh mannequin commented Oct 18, 2010

abalkin commented Oct 18, 2010

pitrou commented Oct 18, 2010

abalkin commented Oct 18, 2010

malemburg commented Oct 18, 2010

casevh mannequin commented Oct 18, 2010

pitrou commented Oct 18, 2010

abalkin commented Oct 18, 2010

casevh mannequin commented Oct 18, 2010

abalkin commented Oct 18, 2010

rhettinger commented Oct 18, 2010

loewis mannequin commented Oct 18, 2010

loewis mannequin commented Oct 18, 2010

loewis mannequin commented Oct 18, 2010

mdickinson commented Oct 19, 2010

casevh mannequin commented Oct 21, 2010

casevh mannequin commented Oct 23, 2010

casevh mannequin commented Oct 23, 2010

benjaminp commented Oct 23, 2010

pitrou commented Oct 23, 2010

pitrou commented Oct 23, 2010

loewis mannequin commented Oct 23, 2010

pitrou commented Oct 23, 2010

casevh mannequin commented Oct 23, 2010

pitrou commented Oct 23, 2010

loewis mannequin commented Oct 23, 2010