classification
Title: Test test_hash_distribution fails on RHEL 6.5 / ppc64
Type: behavior Stage:
Components: Interpreter Core, Tests Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: David.Edelsohn, christian.heimes, dmalcolm, haypo, python-dev, serhiy.storchaka, zaytsev
Priority: high Keywords: patch

Created on 2014-01-07 13:22 by zaytsev, last changed 2014-02-01 02:40 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
pyhash.c.patch zaytsev, 2014-01-07 14:19
le64toh.diff zaytsev, 2014-01-07 16:30
pyhash.preproc.c zaytsev, 2014-01-07 16:53
siphash_ppc64.patch haypo, 2014-01-07 17:25 review
Messages (23)
msg207533 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 13:22
Hi,

When I try the following:

    ./python -m test -v test_hash

on a self-compiled Python 3.4.0b2 on RHEL 6.5 / ppc64, it fails. Please let me know which additional information I can supply to diagnose the problem.

The complete traceback below:

== CPython 3.4.0b2 (default, Jan 6 2014, 15:23:43) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
==   Linux-2.6.32-431.el6.ppc64-ppc64-with-redhat-6.5-Santiago big-endian
==   hash algorithm: siphash24 64bit
==   /XXX/build/test_python_13082
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
[1/1] test_hash
test_empty_string (test.test_hash.BytesHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_null_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.DatetimeDateTests) ... ok
test_randomized_hash (test.test_hash.DatetimeDatetimeTests) ... ok
test_randomized_hash (test.test_hash.DatetimeTimeTests) ... ok
test_hashes (test.test_hash.HashBuiltinsTestCase) ... ok
test_hash_distribution (test.test_hash.HashDistributionTestCase) ... FAIL
test_coerced_floats (test.test_hash.HashEqualityTestCase) ... ok
test_coerced_integers (test.test_hash.HashEqualityTestCase) ... ok
test_numeric_literals (test.test_hash.HashEqualityTestCase) ... ok
test_unaligned_buffers (test.test_hash.HashEqualityTestCase) ... ok
test_default_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_error_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_fixed_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_hashable (test.test_hash.HashInheritanceTestCase) ... ok
test_not_hashable (test.test_hash.HashInheritanceTestCase) ... ok
test_empty_string (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_null_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_empty_string (test.test_hash.StrHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_null_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_ucs2_string (test.test_hash.StrHashRandomizationTests) ... ok

======================================================================
FAIL: test_hash_distribution (test.test_hash.HashDistributionTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/XXX/Lib/test/test_hash.py", line 340, in test_hash_distribution
    self.assertGreater(len(s15), 8, prefix)
AssertionError: 1 not greater than 8 : abc

----------------------------------------------------------------------
Ran 30 tests in 0.694s

FAILED (failures=1)
1 test failed:
    test_hash

Thanks!
msg207534 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 13:28
As requested by Victor Stinner:

./python  -c 'import sys; print(sys.hash_info)'

sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
msg207535 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 13:39
Sorry for accidentally rolling back your changes to the bug, Antoine!
msg207536 - (view) Author: Roundup Robot (python-dev) Date: 2014-01-07 13:42
New changeset 81f8b4744f1a by Victor Stinner in branch 'default':
Issue #20162: test_hash_distribution() uses subTest() to mention the prefix in
http://hg.python.org/cpython/rev/81f8b4744f1a
msg207537 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 13:49
======================================================================
FAIL: test_hash_distribution (test.test_hash.HashDistributionTestCase) (prefix='abc')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/XXX/Lib/test/test_hash.py", line 341, in test_hash_distribution
    self.assertGreater(len(s15), 8, prefix)
AssertionError: 1 not greater than 8 : abc

======================================================================
FAIL: test_hash_distribution (test.test_hash.HashDistributionTestCase) (prefix='abcdefghabc')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/XXX/Lib/test/test_hash.py", line 341, in test_hash_distribution
    self.assertGreater(len(s15), 8, prefix)
AssertionError: 1 not greater than 8 : abcdefghabc
msg207539 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 14:19
To check whether the problem is in the _le64toh() macro as suggested by Victor Stinner, I've tried the attached patch and the problem is gone.

As it turns out, there actually seem to be two problems:

First, HAVE_ENDIAN_H is properly defined, because the correct include file is found, but HAVE_LETOH64 is not defined, even though it exists in the system and is working. Therefore, the macro implementation is used instead.

However, the second problem is that apparently something is wrong about the macro, even though Victor says it looks fine on the paper, because when macro is replaced by a function from glibc like in my patch, it works.

The solution to the first problem, I guess, is to add a proper check to the configure script. I don't have a solution for the second problem.

For the record, the test output with the patch applied:

bash-4.1$ ./python -m test -v test_hash
== CPython 3.4.0b2 (default, Jan 7 2014, 15:03:44) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
==   Linux-2.6.32-431.el6.ppc64-ppc64-with-redhat-6.5-Santiago big-endian
==   hash algorithm: siphash24 64bit
==   /XXX/build/test_python_27880
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
[1/1] test_hash
test_empty_string (test.test_hash.BytesHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_null_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.BytesHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.DatetimeDateTests) ... ok
test_randomized_hash (test.test_hash.DatetimeDatetimeTests) ... ok
test_randomized_hash (test.test_hash.DatetimeTimeTests) ... ok
test_hashes (test.test_hash.HashBuiltinsTestCase) ... ok
test_hash_distribution (test.test_hash.HashDistributionTestCase) ... ok
test_coerced_floats (test.test_hash.HashEqualityTestCase) ... ok
test_coerced_integers (test.test_hash.HashEqualityTestCase) ... ok
test_numeric_literals (test.test_hash.HashEqualityTestCase) ... ok
test_unaligned_buffers (test.test_hash.HashEqualityTestCase) ... ok
test_default_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_error_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_fixed_hash (test.test_hash.HashInheritanceTestCase) ... ok
test_hashable (test.test_hash.HashInheritanceTestCase) ... ok
test_not_hashable (test.test_hash.HashInheritanceTestCase) ... ok
test_empty_string (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_null_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.MemoryviewHashRandomizationTests) ... ok
test_empty_string (test.test_hash.StrHashRandomizationTests) ... ok
test_fixed_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_long_fixed_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_null_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_randomized_hash (test.test_hash.StrHashRandomizationTests) ... ok
test_ucs2_string (test.test_hash.StrHashRandomizationTests) ... ok

----------------------------------------------------------------------
Ran 30 tests in 0.759s

OK
1 test OK.
msg207540 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 14:28
I was also asked to mention this:

https://github.com/majek/csiphash/blob/master/csiphash.c

as an alternative implementation of siphash and platform checks.
msg207554 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 16:30
After lots of fiddling, I can tell you what's wrong with the macro: apparently it's a compiler bug, visible at -O2 and disappearing at -O1. 

Assembly output is attached, unfortunately, I'm no ppc64 expert, so I can't immediately tell what exactly went wrong.

In any case, the first half of the bug (missing check for HAVE_LETOH64) which triggered the second half still seems valid.

I think it's not feasible to upgrade the compiler on RHEL 6.5 (unless someone with a subscription wants to have a go), but fixing the first part should avoid the compiler bug altogether.
msg207556 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2014-01-07 16:38
On Tue, 2014-01-07 at 16:30 +0000, Yury V. Zaytsev wrote:
> Yury V. Zaytsev added the comment:
> 
> After lots of fiddling, I can tell you what's wrong with the macro: apparently it's a compiler bug, visible at -O2 and disappearing at -O1. 

Can you reduce the suspected compiler bug to a minimal reproducer?
Please then run it through the preprocessor using gcc's -E option, and
then attach it to this bug.

What exact version of the compiler are you using, and with what flags?
msg207558 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 16:45
Hi David,

It's gcc from RHEL 6.5 gcc-4.4.7-4.el6.ppc64, the flags are "-DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes" vs. "-DNDEBUG -g -fwrapv -O1 -Wall -Wstrict-prototypes".

The problem is, I can't isolate it. We honestly tried it with Victor, but gcc doesn't want to miscompile a trivial example.

You can see in the attachment as far as I could get: I wrapped the macro in a function, so that it doesn't inline and compared the assembly of the good and bad object files.

Hope that helps!
msg207562 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 16:53
Look for _le64toh ;-)
msg207564 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 17:07
Digging more into it, I guess I know why we couldn't come up with a minimal reproducer for this problem. If I compile with -O2 instead of -O1, I get the following warning from gcc:

Python/pyhash.c:413: warning: dereferencing pointer 'pt.32' does break strict-aliasing rules

which points to the following line:

case 4: *((PY_UINT32_T*)&pt[0]) = *((PY_UINT32_T*)&m[0]); break;

If I re-compile with -O2, but -fno-strict-aliasing, then the result doesn't fail.

Not sure if siphash code can be changed to not require aliasing, though.
msg207566 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-07 17:25
> Python/pyhash.c:413: warning: dereferencing pointer 'pt.32' does break strict-aliasing rules

Attached patch siphash_ppc64.patch should fix this aliasing issue.
msg207568 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 17:32
Related issue where memcpy() was discussed: http://bugs.python.org/issue19183 .
msg207570 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-07 17:34
> Attached patch siphash_ppc64.patch should fix this aliasing issue.

It looks like memcpy was discussed in the initial issue #19183 for
performances. IMO the function must first produce the good result, and
then be fast :-)
msg207571 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2014-01-07 17:49
Oh heck ... I didn't run into this issue when I was testing siphash on all platforms. Could it be a compiler bug? I'd rather not change the code and deviate from the reference implementation. It's a performance critical part...
msg207574 - (view) Author: Yury V. Zaytsev (zaytsev) Date: 2014-01-07 18:05
Hi Christian,

Dave says "it's not a compiler bug; the code is slightly violating the C standard, and the compiler optimizes based on a strict reading of the rules".

If I compile with -O2 and higher (while -O3 is the default for Python, as far as I can understand), the compiler correctly warns me about possible issues; as you can see, the macro itself is not at fault, even.

It's just that in my particular (rare) mix of circumstances: being big endian, using the macro instead of the glibc function due to the first part of the bug, etc. it results in miscompilation, and in your case, it remains a warning, but the result still works.

I think the "clean" solutions to this problem are as follows:

1) Do not break strict aliasing (use memcpy() instead or something)
2) Disable strict aliasing based optimizations (-fno-strict-aliasing)

In addition, fixing the first part of the bug (wrongly using a macro even if le64toh is available), will mask the second part for me again, but I'm not sure if it will re-surface at some point later or not :)

Hope that helps!
msg207575 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-07 18:08
On platforms where unaligned access is not an issue, gcc produces for memcpy() an optimal binary code equivalent to simple assignment of 32-bit value. Only MSVC produces slow code, but Windows works only on platforms where unaligned access is not an issue. So we can use simple assignment when _MSC_VER is defined and memcpy() in other cases (it is what I was proposed in issue19183 for PY_UHASH_CPY).
msg207577 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-07 18:18
> Only MSVC produces slow code, but Windows works only on platforms where unaligned access is not an issue.

My patch uses Py_MEMCPY() which uses a loop to copy bytes on Visual
Studio for length smaller than 16 bytes (SipHash copies 4 bytes).
msg208326 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-17 08:17
Well, then it LGTM.
msg208330 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-01-17 09:16
@Christian: Are you ok with siphash_ppc64.patch? I'm going to push the fix.
msg209863 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-01 02:39
New changeset caebb4f231da by Victor Stinner in branch 'default':
Issue #20162: Fix an alignment issue in the siphash24() hash function which
http://hg.python.org/cpython/rev/caebb4f231da
msg209864 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-01 02:40
I applied siphash_ppc64.patch. Thanks Yury V. Zaytsev for your report and your help to investigate this tricky bug.
History
Date User Action Args
2014-02-01 02:40:01hayposetstatus: open -> closed
resolution: fixed
messages: + msg209864
2014-02-01 02:39:19python-devsetmessages: + msg209863
2014-01-17 09:16:41hayposetmessages: + msg208330
2014-01-17 08:17:43serhiy.storchakasetmessages: + msg208326
2014-01-07 18:18:24hayposetmessages: + msg207577
2014-01-07 18:08:04serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg207575
2014-01-07 18:05:30zaytsevsetmessages: + msg207574
2014-01-07 17:49:24christian.heimessetmessages: + msg207571
2014-01-07 17:34:01hayposetmessages: + msg207570
2014-01-07 17:32:31zaytsevsetmessages: + msg207568
2014-01-07 17:25:47hayposetfiles: + siphash_ppc64.patch
nosy: + haypo
messages: + msg207566

2014-01-07 17:07:37zaytsevsetmessages: + msg207564
2014-01-07 16:53:19zaytsevsetfiles: + pyhash.preproc.c

messages: + msg207562
2014-01-07 16:45:28zaytsevsetmessages: + msg207558
2014-01-07 16:38:35dmalcolmsetmessages: + msg207556
2014-01-07 16:30:21zaytsevsetfiles: + le64toh.diff

messages: + msg207554
2014-01-07 14:28:01zaytsevsetmessages: + msg207540
2014-01-07 14:19:12zaytsevsetfiles: + pyhash.c.patch
keywords: + patch
messages: + msg207539
2014-01-07 13:49:56zaytsevsetmessages: + msg207537
2014-01-07 13:42:13python-devsetnosy: + python-dev
messages: + msg207536
2014-01-07 13:40:08zaytsevsetcomponents: + Tests
2014-01-07 13:39:38zaytsevsettype: crash -> behavior
messages: + msg207535
components: + Interpreter Core, - Tests
2014-01-07 13:28:03zaytsevsettype: behavior -> crash
messages: + msg207534
components: - Interpreter Core
2014-01-07 13:24:31pitrousetpriority: normal -> high
nosy: + christian.heimes, dmalcolm, David.Edelsohn
type: crash -> behavior
components: + Interpreter Core
2014-01-07 13:22:35zaytsevcreate