classification
Title: Make hash values the same width as a pointer (or Py_ssize_t)
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.3, Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, benjamin.peterson, casevh, georg.brandl, jimjjewett, ked-tao, lemburg, loewis, mark.dickinson, pitrou, rhettinger, skip.montanaro, skrah, tim.peters
Priority: normal Keywords: patch

Created on 2010-09-04 22:11 by pitrou, last changed 2010-10-23 19:09 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
hash_t.diff benjamin.peterson, 2010-10-17 20:25
hash_t.diff benjamin.peterson, 2010-10-17 20:26
deunsignify.diff benjamin.peterson, 2010-10-18 03:22
py_ssize_t_hash_patch.diff casevh, 2010-10-21 07:10
Py_uhash_t_patch.diff casevh, 2010-10-23 03:35
hashw64.patch pitrou, 2010-10-23 16:40
Messages (52)
msg115617 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-04 22:11
Currently, Python produces hash values with fit in a C "long". This is fine at first sight, but in the context of dict and set implementations, it means that
1) holding hashes and indices in the same field of a structure requires some care (see issue1646068)
2) on platforms where a long is smaller than a Py_ssize_t (e.g. Win64), very big hash tables could suffer from lots of artificial collisions (the hash table being bigger than the range of possible hash values)
3) when a long is smaller than Py_ssize_t, we don't save any size anyway, since having some pointers follow a C "long" in a structure implies some padding to keep all fields naturally aligned

A future-proof option would be to change all hash values to be of Py_ssize_t values rather than C longs. Either directly, or by defining a new dedicated alias Py_hash_t. This would also impact the ABI, I suppose.
msg115618 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-04 22:12
As an example of padding behaviour (under Win64 with 32-bit longs and 64-bit pointers):

Python 3.2a1+ (py3k, Sep  4 2010, 22:50:10) [MSC v.1500 64 bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.calcsize("l")
4
>>> struct.calcsize("lP")
16
>>> struct.calcsize("lPP")
24
msg115619 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-04 22:14
+1 for PyObject_Hash returning a Py_ssize_t.

Nothing good can come from having a hash table larger than the range of a hash value.  Tests may pass but performance would degrade catastrophically.
msg115626 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-09-05 06:46
> This would also impact the ABI, I suppose.

Correct. So it either needs to happen before 3.2, or wait until 4.0,
or the introduction of "wide" hashes needs to be done in a compatible
manner, likely requiring two parallel hashing infrastructures.
msg115639 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-05 11:12
> Correct. So it either needs to happen before 3.2, or wait until 4.0,

Shouldn't there be a provision for ABI versioning?
Or do you suggest bumping to the next major number (4.0, 5.0...) be done on the basis of ABI changes?
msg115656 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-09-05 16:59
Am 05.09.2010 13:12, schrieb Antoine Pitrou:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> Correct. So it either needs to happen before 3.2, or wait until 4.0,
> 
> Shouldn't there be a provision for ABI versioning?

There is certainly support for ABI versioning; the ABI version defined
in the PEP is called "python3.dll".

> Or do you suggest bumping to the next major 
> number (4.0, 5.0...) be done on the basis of ABI changes?

No, vice versa. The PEP promises that the ABI won't change until Python
4. For any change that might break the ABI, either a
backwards-compatible solution needs to be found, or the change be
deferred to Python 4.
msg115657 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-09-05 17:03
I would suggest making this change before the ABI freeze starts, then.
msg115695 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-09-06 10:15
> ... change all hash values to be of Py_ssize_t values rather than C longs ...

Yes, please!  (Provided this change can go in before 3.2.)  For the numeric types at least, this should be a straightforward adjustment.

And while we're at it, we could clean up some of the undefined behaviour from integer overflow that's in the current hash functions (tuple.__hash__, for example).
msg118789 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-15 16:04
Ping?
msg118820 - (view) Author: Skip Montanaro (skip.montanaro) * (Python committer) Date: 2010-10-15 18:58
Not that anybody needs my input on this, but...

Given the range of people advocating for this change, this looks to me
like it should be a release blocker for 3.2.  Raymond's comment about
performance seems especially important, and since the world seems to
be moving toward 64-bit operating systems (certainly should happen in
a big way during the lifetime of Python 3) it seems worthwhile to hold
up further 3.2 releases until this is solved.
msg118823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-15 19:12
> Given the range of people advocating for this change, this looks to me
> like it should be a release blocker for 3.2.  Raymond's comment about
> performance seems especially important, and since the world seems to
> be moving toward 64-bit operating systems (certainly should happen in
> a big way during the lifetime of Python 3) it seems worthwhile to hold
> up further 3.2 releases until this is solved.

I think this is a bit exagerated. The performance issues will only
appear if you have huge dicts and sets.
The issue Raymond raised is the potential impossibility of making the
change /after/ we settle on a stable ABI. The question is whether the
ABI will be enforced starting from 3.2, or from a later date.
msg118953 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-17 17:40
> The issue Raymond raised is the potential impossibility of making the
> change /after/ we settle on a stable ABI. The question is whether the
> ABI will be enforced starting from 3.2, or from a later date.

I'd like to repeat that it will not be impossible to fix this after the
ABI is in place.
msg118954 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-17 17:50
Le dimanche 17 octobre 2010 à 17:40 +0000, Martin v. Löwis a écrit :
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
> > The issue Raymond raised is the potential impossibility of making the
> > change /after/ we settle on a stable ABI. The question is whether the
> > ABI will be enforced starting from 3.2, or from a later date.
> 
> I'd like to repeat that it will not be impossible to fix this after the
> ABI is in place.

At the cost of very annoying complications, though.
msg118955 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-17 17:53
Can't we just do it now, and be done with it regardless of the stable ABI?
msg118959 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-17 18:07
> Can't we just do it now, and be done with it regardless of the stable ABI?

Sure. Somebody needs to implement it (and consider what consequences
this has on third-party modules - I'm uncertain).
msg118968 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-17 19:35
I'm the maintainer of a third-party library (gmpy) that would be impacted by this and I'm definately in favor of this change. With ever increasing amounts of memory becoming standard in computers, more users will encounter performance issues with large dictionaries. I see 3.2 as the version of Python 3.x that gains wide-spread use. Regardless of the timing of a stable ABI, it's popularity will make it more difficult change the hash size in the future.
msg118972 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-17 19:51
> I'm the maintainer of a third-party library (gmpy) that would be
> impacted by this and I'm definitely in favor of this change.

Assume Python would make such a change, and users would build released
gmpy versions for such a Python release. What would happen (in terms
of observable errors)?
msg118980 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-17 20:25
Here's a patch. Please review.
msg118982 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-17 20:42
A few comments:
- in complex_hash, why did you drop the cast?
- in object.rst, add versionchanged. It should also explain that Py_hash_t is a signed integer with the same width as size_t.

Otherwise, it looks fine.
msg118983 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-17 20:55
Dropping the cast was inadvertent. I've now added versionchanged and committed it in r85664.
msg118987 - (view) Author: Skip Montanaro (skip.montanaro) * (Python committer) Date: 2010-10-17 22:23
I added a placeholder to the What's New document.  Since it will affect extension module authors it should be mentioned at a higher level than just Misc/NEWS.
msg118994 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-18 02:43
The patch does not address that "unsigned long" is still used to calculate the hash values. This breaks numeric hashing and leads to incorrect values.

Python 3.2a3+ (py3k, Oct 17 2010, 19:03:38) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=-1, inf=314159, nan=0, imag=1000003)
>>>

Would "size_t" be the appropriate type to use?
msg118995 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-18 03:22
Good point. Here's another patch:
msg119000 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-18 10:18
Wouldn't it be better to use Py_hash_t instead of size_t for calculating the hash values in those hash functions ?
msg119001 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-18 10:21
And related to my previous comment: shouldn't Py_hash_t map to size_t instead of Py_ssize_t ?
msg119011 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-18 12:45
2010/10/18 Marc-Andre Lemburg <report@bugs.python.org>:
>
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>
> And related to my previous comment: shouldn't Py_hash_t map to size_t instead of Py_ssize_t ?

No, negative values have to be allowed.
msg119015 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-18 13:39
Some quick comments on the latest patch.

1) I don't think you can remove the type cast used when comparing the hash value against -1 and -2. IIRC, GCC considers that undefined behavior.

2) In sysmodule.c, we need to use PyLong_FromSsize_t when storing _PyHASH_MODULUS.

3) In pyport.h, we need a better way to define _PyHASH_MODULUS. The existing "((1UL << _PyHASH_BITS) - 1)" fails when unsigned long is smaller than Py_ssize_t. "((1ULL << _PyHASH_BITS) - 1)" works on 64-bit Windows but is not a good solution for systems that don't have an unsigned long long type. I haven't thought of a better solution for one.
msg119017 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-18 13:56
On Mon, Oct 18, 2010 at 8:45 AM, Benjamin Peterson
<report@bugs.python.org> wrote:
..
> No, negative values have to be allowed.
>

Why?  As far as I can tell, negative values are only used as sentinels
and we can use say (size_t)-1 instead of -1L.  Are there cases where
hash values' ordering is important?
msg119019 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-18 13:59
Le lundi 18 octobre 2010 à 13:56 +0000, Alexander Belopolsky a écrit :
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> On Mon, Oct 18, 2010 at 8:45 AM, Benjamin Peterson
> <report@bugs.python.org> wrote:
> ..
> > No, negative values have to be allowed.
> >
> 
> Why?  As far as I can tell, negative values are only used as sentinels
> and we can use say (size_t)-1 instead of -1L.

You can, except that changing the sentinel value will probably break a
lot of code out there (and there's no benefit AFAICT).
msg119020 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-18 14:09
On Mon, Oct 18, 2010 at 9:59 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..
>> Why?  As far as I can tell, negative values are only used as sentinels
>> and we can use say (size_t)-1 instead of -1L.
>
> You can, except that changing the sentinel value will probably break a
> lot of code out there (and there's no benefit AFAICT).

AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
break code than a change from -1L to  (Py_ssize_t)-1.  (Assuming a
sizeof(long) != sizeof(void*) platform.)  The benefit, though is that
hash computations can be performed natively on the hash values without
casting to an unrelated type.  If Py_hash_t needs to be signed, I
would like to see a typedef Py_uhash_t size_t next to that for
Py_hash_t and  Py_uhash_t used in hash computations rather than
explicit size_t.
msg119025 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-18 14:47
Benjamin Peterson wrote:
> 
> Benjamin Peterson <benjamin@python.org> added the comment:
>> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>>
>> And related to my previous comment: shouldn't Py_hash_t map to size_t instead of Py_ssize_t ?
> 
> No, negative values have to be allowed.

Ah, right... we use -1 as error indicator.
msg119029 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-18 15:19
Also, note that hash(-12) is -12.
msg119030 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-18 15:27
> AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
> break code than a change from -1L to  (Py_ssize_t)-1.  (Assuming a
> sizeof(long) != sizeof(void*) platform.)

That's true.

> The benefit, though is that
> hash computations can be performed natively on the hash values without
> casting to an unrelated type.

I don't understand what you mean by "native" and "unrelated". Signed
integers are not less native than unsigned ones.
msg119034 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-18 16:09
On Mon, Oct 18, 2010 at 11:27 AM, Antoine Pitrou <report@bugs.python.org> wrote:
..
>> The benefit, though is that
>> hash computations can be performed natively on the hash values without
>> casting to an unrelated type.
>
> I don't understand what you mean by "native" and "unrelated". Signed
> integers are not less native than unsigned ones.

Sorry, I could have been clearer indeed.   Consider the following code:

static Py_hash_t
long_hash(PyLongObject *v)
{
    unsigned long x;
...
    x = x * sign;
    if (x == (unsigned long)-1)
        x = (unsigned long)-2;
    return (Py_hash_t)x;
}

Wouldn't it be cleaner if x was the same type as hash?  Note that
unsigned long is now wrong.  What is needed is "unsigned integer type
of the same size as Py_hash_t."  If Py_hash_t has to stay signed, I
think we should at least not rely of sizeof(Py_hash_t) to always
remain the same as sizeof(size_t).
msg119043 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-18 17:25
>Sorry, I could have been clearer indeed.   Consider the following code:
>
>static Py_hash_t
>long_hash(PyLongObject *v)
>{
>    unsigned long x;
>...
>    x = x * sign;
>    if (x == (unsigned long)-1)
>        x = (unsigned long)-2;
>    return (Py_hash_t)x;
>}

>Wouldn't it be cleaner if x was the same type as hash?  Note that
>unsigned long is now wrong.  What is needed is "unsigned integer type
>of the same size as Py_hash_t."  If Py_hash_t has to stay signed, I
>think we should at least not rely of sizeof(Py_hash_t) to always
>remain the same as sizeof(size_t).

The calculation of long_hash assumes an unsigned temporary type to get correct results for the bit shifting and masking. The calculation is done on the absolute value of the long and then the sign is applied. We either needed to (1) add an unsigned Py_hash_t type or (2) just use size_t and Py_ssize_t.
msg119044 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-18 17:31
On Mon, Oct 18, 2010 at 1:25 PM, Case Van Horsen <report@bugs.python.org> wrote:
..
> We either needed to (1) add an unsigned Py_hash_t type or (2) just use size_t and Py_ssize_t.
>

Option (2) may actually be preferable because dict and set
implementations rely on Py_hash_t being compatible with Py_ssize_t.
See issue 1646068.
msg119058 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-10-18 18:19
My expectation is that Py_hash_t is the same as Py_ssize_t.  The goal is to make hashes match the range of possible table sizes.
msg119077 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-18 20:44
Am 18.10.2010 17:27, schrieb Antoine Pitrou:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> AFAICT, a change from (Py_ssize_t)-1 to (size_t)-1 is less likely to
>> break code than a change from -1L to  (Py_ssize_t)-1.  (Assuming a
>> sizeof(long) != sizeof(void*) platform.)
> 
> That's true.

I don't think it is. Code that is changed to use the new return type
will not break in a change from long to Py_ssize_t, since all values
will sign-expand correctly, in all cases; the same is true if the new
return type was size_t. So for code that gets adjusted in its return
value, no breakage in either case.

For code that *doesn't* get adjusted, I'm still uncertain what will
happen. However, ISTM that if there is a cast to the "correct" function
pointer type, you get an incorrect detection of the guard value in
either case: e.g. on AMD64, the return value is in RAX, yet the hash
function may only fill out EAX. I'm not sure what values the upper
32 bits will have, but I doubt it gets sign-expanded - which it should
regardless of whether the expected value is (size_t)-1 or
(Py_ssize_t)-1. So you get breakage in either case.

Please correct me if I'm wrong.
msg119079 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-18 20:46
> Wouldn't it be cleaner if x was the same type as hash?  Note that
> unsigned long is now wrong.  What is needed is "unsigned integer type
> of the same size as Py_hash_t."  If Py_hash_t has to stay signed, I
> think we should at least not rely of sizeof(Py_hash_t) to always
> remain the same as sizeof(size_t).

But this is an absolute requirement, a guarantee that we provide
forever, and the whole point of this patch. Py_hash_t *will* be
a signed version of size_t, just as Py_ssize_t. Not by chance, but
by careful, inherent design.
msg119080 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-18 20:48
> My expectation is that Py_hash_t is the same as Py_ssize_t.  The goal
> is to make hashes match the range of possible table sizes.

Please read the patch:

typedef Py_ssize_t Py_hash_t;
msg119124 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-10-19 07:15
> The calculation of long_hash assumes an unsigned temporary type to get
> correct results for the bit shifting and masking.

Yes, exactly.

> The calculation is
> done on the absolute value of the long and then the sign is applied. We
> either needed to (1) add an unsigned Py_hash_t type or (2) just use
> size_t and Py_ssize_t.

I like (2);  the use of Py_hash_t suggests to me that the type used for the hash is configurable independently of Py_ssize_t, which isn't true.

Also, with Py_hash_t it's no longer clear that printing a hash value (e.g., using PyErr_Format and friends) should use the '%zd' modifier.
msg119265 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-21 07:10
I've attached a patch that fixes hashing for numerical types, sys.hash_info is now correct, fixes typeobject.c/wrap_hashfunc and tupleobject.c/tuplehash to use Py_ssize_t instead of long, and uses Py_ssize_t instead of Py_hash_t.

I think it is clearer to use Py_ssize_t instead of Py_hash_t. I found two occurances where PyLong_FromLong needed to be replaced by PyLong_FromSsize_t and I think bugs like that would be easier to catch if Py_ssize_t is used.
msg119411 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-23 02:09
I maintain gmpy and it needs to calculate hash values for integers, floats, and rationals. I converted my hash calculations to use Py_ssize_t in a 64-bit Windows enviroment. All my tests pass when I build Python with my previous patch.

In hindsight, I think I made a mistake in my previous patch by eliminating Py_hash_t and using Py_ssize_t/size_t. I ended up defining Py_hash_t and Py_uhash_t in gmpy to simplify the code and to easily support older versions of Python.

I will work on a patch that defines Py_hash_t and Py_uhash_t and upload it later this evening.
msg119415 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-23 03:35
I've uploaded a patch against the current svn trunk that:

1) Defines a Py_uhash_t as equivalent to size_t.
2) Correctly defines _PyHASH_MODULUS on Win64.
3) Replaces several PyLong_FromLong with PyLong_FromSsize_t.
4) Change typeobject/wrap_hashfunc to use Py_hash_t instead of long.
5) Change tupleobject/tuplehash to use Py_hast_t instead of long.
6) Change long/double/complex hash functions to use Py_uhash_t instead of unsigned long.
msg119442 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-23 16:21
Thank you. Applied in r85803.
msg119443 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-23 16:23
Case's patch fixes test_builtin and test_complex failures on Windows 7 64-bit. But there's still a failure in test_dictviews:

======================================================================
FAIL: test_items_set_operations (test.test_dictviews.DictSetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Z:\__svn__\lib\test\test_dictviews.py", line 153, in test_items_set_operations
    {('a', 1), ('b', 2)})
AssertionError: Items in the first set but not the second:
('a', 1)
('b', 2)

----------------------------------------------------------------------

Which boils down to the following issue:

>>> d1 = {'a': 1, 'b': 2}
>>> d1.items()
dict_items([('a', 1), ('b', 2)])
>>> set(d1.items())
{('a', 1), ('b', 2)}
>>> d1.items() | set(d1.items())
{('a', 1), ('a', 1), ('b', 2), ('b', 2)}


There are also a bunch of possibly related failures in test_weakset:

======================================================================
FAIL: test_inplace_on_self (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 293, in test_inplace_on_
self
    self.assertEqual(t, self.s)
AssertionError: <_weakrefset.WeakSet object at 0x000000000283CDC0> != <_weakrefs
et.WeakSet object at 0x000000000283B2C8>

======================================================================
FAIL: test_or (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 72, in test_or
    self.assertEqual(self.s | set(self.items2), i)
AssertionError: <_weakrefset.WeakSet object at 0x000000000285B400> != <_weakrefs
et.WeakSet object at 0x000000000285B260>

======================================================================
FAIL: test_symmetric_difference (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 110, in test_symmetric_d
ifference
    self.assertEqual(c in i, (c in self.d) ^ (c in self.items2))
AssertionError: False != True

======================================================================
FAIL: test_union (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 61, in test_union
    self.assertEqual(c in u, c in self.d or c in self.items2)
AssertionError: False != True

======================================================================
FAIL: test_xor (test.test_weakset.TestWeakSet)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_weakset.py", line 117, in test_xor
    self.assertEqual(self.s ^ set(self.items2), i)
AssertionError: <_weakrefset.WeakSet object at 0x000000000292CA18> != <_weakrefs
et.WeakSet object at 0x000000000292C878>


Another bunch of test_pyclbr failures:

======================================================================
FAIL: test_decorators (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 152, in test_decorators
    self.checkModule('test.pyclbr_input', ignore=['om'])
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'object' missing

======================================================================
FAIL: test_easy (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 142, in test_easy
    self.checkModule('pyclbr')
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'object' missing

======================================================================
FAIL: test_others (test.test_pyclbr.PyclbrTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 158, in test_others
    cm('random', ignore=('Random',))  # from _random import Random as CoreGenera
tor
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 101, in checkModule
    self.assertListEq(real_bases, pyclbr_bases, ignore)
  File "Y:\py3k\__svn__\lib\test\test_pyclbr.py", line 28, in assertListEq
    self.fail("%r missing" % missing.pop())
AssertionError: 'Random' missing


And a test_sys failure which is probably easy to fix:

  File "Z:\__svn__\lib\test\test_sys.py", line 831, in test_objecttypes
    check(s, basicsize)
  File "Z:\__svn__\lib\test\test_sys.py", line 601, in check_sizeof
    self.assertEqual(result, size, msg)
AssertionError: wrong size for <class 'str'>: got 74, expected 66


All these tests pass in 32-bit mode.
Case, you can run the regression test suite with:
  python.exe -m test.regrtest
and invidual tests with e.g.:
  python.exe -m test.regrtest -v test_dictviews
msg119446 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-23 16:40
This patch seems to fix all aforementioned failures.
msg119453 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-23 17:34
These changes also look all reasonable to me.
msg119456 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-23 17:38
Ok, I've committed them in r85808.
msg119464 - (view) Author: Case Van Horsen (casevh) Date: 2010-10-23 18:41
On Win64, I get two unexpected failures:

I terminated test_capi after a Windows exception box popped up.

[ 37/349] test_capi
test test_capi failed -- Traceback (most recent call last):
  File "C:\svn\py3k\lib\test\test_capi.py", line 50, in test_no_FatalError_infinite_loop
    b'Fatal Python error:'
AssertionError: b"Fatal Python error: PyThreadState_Get: no current thread\r\n\r\nThis application has requested the Runtime to term
inate it in an unusual way.\nPlease contact the application's support team for more information." != b'Fatal Python error: PyThreadS
tate_Get: no current thread'

[ 67/349] test_concurrent_futures
test test_concurrent_futures failed -- Traceback (most recent call last):
  File "C:\svn\py3k\lib\test\test_concurrent_futures.py", line 442, in test_timeout
    future1]), finished)
AssertionError: Items in the second set but not the first:
<Future at 0x5510fd0 state=finished raised AssertionError>

The dictviews test passes successfully.
msg119465 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-23 18:52
The test_capi problem is not 64-bit-specific (see issue9116).

As for test_concurrent_futures, I also have a failure (not the same one) in both 32-bit and 64-bit builds here. I'm gonna open a separate issue.
msg119468 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-23 19:09
I recommend to declare this issue closed, and keep it closed unless somebody wants to propose to revert the change (widening the hash type) completely. Any remaining issues that people want to attribute to this change (correctly or incorrectly) should be reported as separate issues.
History
Date User Action Args
2010-10-23 19:09:16loewissetstatus: open -> closed

messages: + msg119468
2010-10-23 18:52:34pitrousetmessages: + msg119465
2010-10-23 18:41:38casevhsetstatus: pending -> open

messages: + msg119464
2010-10-23 17:38:53pitrousetstatus: open -> pending

messages: + msg119456
2010-10-23 17:34:11loewissetmessages: + msg119453
2010-10-23 16:40:35pitrousetfiles: + hashw64.patch

messages: + msg119446
2010-10-23 16:32:44skrahsetnosy: + skrah
2010-10-23 16:23:41pitrousetstatus: closed -> open
2010-10-23 16:23:32pitrousetmessages: + msg119443
2010-10-23 16:21:00benjamin.petersonsetmessages: + msg119442
2010-10-23 03:35:37casevhsetfiles: + Py_uhash_t_patch.diff

messages: + msg119415
2010-10-23 02:09:58casevhsetmessages: + msg119411
2010-10-21 07:10:16casevhsetfiles: + py_ssize_t_hash_patch.diff

messages: + msg119265
2010-10-19 07:15:04mark.dickinsonsetmessages: + msg119124
2010-10-18 20:48:42loewissetmessages: + msg119080
2010-10-18 20:46:25loewissetmessages: + msg119079
2010-10-18 20:44:25loewissetmessages: + msg119077
2010-10-18 18:19:01rhettingersetmessages: + msg119058
2010-10-18 17:31:49belopolskysetmessages: + msg119044
2010-10-18 17:25:29casevhsetmessages: + msg119043
2010-10-18 16:09:16belopolskysetmessages: + msg119034
2010-10-18 15:27:27pitrousetmessages: + msg119030
2010-10-18 15:19:19casevhsetmessages: + msg119029
2010-10-18 14:47:34lemburgsetmessages: + msg119025
2010-10-18 14:11:54belopolskylinkissue1646068 superseder
2010-10-18 14:09:11belopolskysetmessages: + msg119020
2010-10-18 13:59:44pitrousetmessages: + msg119019
2010-10-18 13:56:01belopolskysetmessages: + msg119017
2010-10-18 13:39:03casevhsetmessages: + msg119015
2010-10-18 12:45:40benjamin.petersonsetmessages: + msg119011
title: Make hash values the same width as a pointer (or Py_ssize_t) -> Make hash values the same width as a pointer (or Py_ssize_t)
2010-10-18 10:21:01lemburgsetmessages: + msg119001
2010-10-18 10:18:44lemburgsetnosy: + lemburg
messages: + msg119000
2010-10-18 03:22:38benjamin.petersonsetfiles: + deunsignify.diff

messages: + msg118995
2010-10-18 02:43:26casevhsetmessages: + msg118994
2010-10-17 22:23:49skip.montanarosetmessages: + msg118987
2010-10-17 20:55:12benjamin.petersonsetstatus: open -> closed
resolution: accepted
messages: + msg118983
2010-10-17 20:42:02loewissetmessages: + msg118982
2010-10-17 20:33:26pitrousetnosy: tim.peters, loewis, skip.montanaro, georg.brandl, rhettinger, jimjjewett, mark.dickinson, belopolsky, pitrou, casevh, ked-tao, benjamin.peterson
2010-10-17 20:26:37benjamin.petersonsetfiles: + hash_t.diff
2010-10-17 20:25:37benjamin.petersonsetfiles: + hash_t.diff

nosy: + benjamin.peterson
messages: + msg118980

keywords: + patch
2010-10-17 19:51:53loewissetmessages: + msg118972
2010-10-17 19:35:35casevhsetmessages: + msg118968
2010-10-17 18:07:12loewissetmessages: + msg118959
2010-10-17 17:53:01georg.brandlsetmessages: + msg118955
2010-10-17 17:50:44pitrousetmessages: + msg118954
2010-10-17 17:40:18loewissetmessages: + msg118953
2010-10-15 19:12:26pitrousetmessages: + msg118823
2010-10-15 18:58:41skip.montanarosetnosy: + skip.montanaro
messages: + msg118820
2010-10-15 16:04:19georg.brandlsetmessages: + msg118789
2010-09-06 10:15:23mark.dickinsonsetmessages: + msg115695
2010-09-05 17:03:21georg.brandlsetmessages: + msg115657
2010-09-05 16:59:07loewissetmessages: + msg115656
title: Make hash values the same width as a pointer (or Py_ssize_t) -> Make hash values the same width as a pointer (or Py_ssize_t)
2010-09-05 16:37:19casevhsetnosy: + casevh
2010-09-05 11:12:52pitrousetmessages: + msg115639
2010-09-05 06:46:10loewissetmessages: + msg115626
title: Make hash values the same width as a pointer (or Py_ssize_t) -> Make hash values the same width as a pointer (or Py_ssize_t)
2010-09-04 22:14:39rhettingersetmessages: + msg115619
2010-09-04 22:12:40pitrousetmessages: + msg115618
2010-09-04 22:11:29pitroucreate