msg199078 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-06 14:58 |
The patch implements the current state of PEP 456 plus a configure option to select the hash algorithm. I have tested it only on 64bit Linux so far.
|
msg199113 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-06 19:23 |
Here is a simple benchmark (Linux, gcc 4.7.3):
$ ./python -m timeit -s "words=[w for line in open('LICENSE') for w in line.split()]; import collections" "c = collections.Counter(words); c.most_common(10)"
- 64-bit build, before: 313 usec per loop
- 64-bit build, after: 298 usec per loop
- 32-bit build, before: 328 usec per loop
- 32-bit build, before: 329 usec per loop
- x32 build, before: 291 usec per loop
- x32 build, after: 284 usec per loop
|
msg199114 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-06 19:28 |
Microbenchmarking hash computation (Linux, gcc 4.7.3):
* Short strings:
python -m timeit -s "b=b'x'*20" "hash(memoryview(b))"
- 64-bit build, before: 0.263 usec per loop
- 64-bit build, after: 0.263 usec per loop
- 32-bit build, before: 0.303 usec per loop
- 32-bit build, after: 0.358 usec per loop
* Long strings:
python -m timeit -s "b=b'x'*1000" "hash(memoryview(b))"
- 64-bit build, before: 1.56 usec per loop
- 64-bit build, after: 1.03 usec per loop
- 32-bit build, before: 1.61 usec per loop
- 32-bit build, after: 2.46 usec per loop
Overall, performance looks fine to me.
|
msg199137 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-07 10:57 |
Your benchmark is a bit unrealistic because it times the hash cache most of the time. Here is a better benchmark (but bytes-only):
$ ./python -m timeit -s "words=[w.encode('utf-8') for line in open('../LICENSE') for w in line.split()]; import collections" -- "c = collections.Counter(memoryview(w) for w in words); c.most_common(10)"
1000 loops, best of 3: 1.63 msec per loop
This increases the number of hash calculations from about 28k to over 8.4 mio.
I also added a little statistic function to see how large typical string are. The artificial benchmark:
hash 1: 115185
hash 2: 1440956
hash 3: 1679976
hash 4: 873769
hash 5: 948124
hash 6: 651799
hash 7: 676707
hash 8: 545459
hash 9: 523615
hash 10: 421232
hash 11: 161641
hash 12: 140797
hash 13: 86826
hash 14: 41702
hash 15: 41570
hash 16: 332
hash 17: 211
hash 18: 4275
hash 19: 205
hash 20: 131
hash 21: 4197
hash 22: 70
hash 23: 35
hash 24: 44
hash 25: 4145
hash 26: 4137
hash 27: 4137
hash 28: 21
hash 29: 4124
hash 30: 8
hash 31: 5
hash 32: 1
hash other: 28866
hash total: 8404302
And here is the statistic of a full test run.
hash 1: 18935
hash 2: 596761
hash 3: 643973
hash 4: 645399
hash 5: 576231
hash 6: 742531
hash 7: 497214
hash 8: 330890
hash 9: 291301
hash 10: 93206
hash 11: 1417900
hash 12: 160802
hash 13: 58675
hash 14: 49324
hash 15: 48068
hash 16: 90634
hash 17: 24163
hash 18: 66079
hash 19: 23408
hash 20: 20695
hash 21: 16424
hash 22: 17236
hash 23: 59135
hash 24: 10368
hash 25: 6047
hash 26: 6784
hash 27: 5565
hash 28: 5931
hash 29: 3469
hash 30: 4220
hash 31: 2652
hash 32: 2911
hash other: 72042
hash total: 6608973
About 50% of the hashed elements are between 1 and 5 characters long. A realistic hash collision attack on 32bit needs at least 7, 8 chars. I see the chance of a micro-optimization! :)
|
msg199139 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-07 11:12 |
> Your benchmark is a bit unrealistic because it times the hash cache
> most of the time. Here is a better benchmark (but bytes-only):
>
> $ ./python -m timeit -s "words=[w.encode('utf-8') for line in
> open('../LICENSE') for w in line.split()]; import collections" -- "c
> = collections.Counter(memoryview(w) for w in words);
> c.most_common(10)"
> 1000 loops, best of 3: 1.63 msec per loop
Good point. Can you also post all benchmark results?
|
msg199142 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-07 11:57 |
unmodified Python:
1000 loops, best of 3: 307 usec per loop (unicode)
1000 loops, best of 3: 930 usec per loop (memoryview)
SipHash:
1000 loops, best of 3: 300 usec per loop (unicode)
1000 loops, best of 3: 906 usec per loop (memoryview)
|
msg199144 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-10-07 13:20 |
New changeset c960bed22bf6 by Christian Heimes in branch 'default':
Make Nick BDFG delegate
http://hg.python.org/peps/rev/c960bed22bf6
|
msg199145 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-07 14:54 |
I propose extract all hash related stuff from Include/object.h in separated file Include/pyhash.h. And perhaps move Objects/hash.c to Python/pyhash.c.
|
msg199146 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-07 15:12 |
Since hash algorithm determined at compile time, the _Py_HashSecret_t structure and the _Py_HashSecret function are redundant. We need define only the _Py_HashBytes function.
Currently SipHash algorithm doesn't work with unaligned data.
|
msg199147 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-07 15:28 |
Sure it does. The test for unaligned hashing passes without an error or a segfault.
|
msg199148 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-07 15:32 |
And note that the quality of the FNV hash function is reduced (msg186403). We need "shuffle" result's bits.
|
msg199149 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-07 15:38 |
> The test for unaligned hashing passes without an error or a segfault.
On some platforms it can work without a segfault.
|
msg199884 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-10-14 13:07 |
Added some structural comments to the patch. I'll defer to Serhiy when it comes to assessing the algorithm details :)
|
msg201193 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-24 22:40 |
I have created a clone for PEP 456 and applied your suggestions. I'm still looking for a nice API to handle the hash definition. Do you have some suggestions?
|
msg201405 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-27 01:52 |
Nick, can you do another review? All tests should pass on common boxes.
The latest code hides the struct with the hash function. I have added a configure block that detects platforms that don't support unaligned memory access. It works correctly on the SPARC Solaris 10 box.
I'm still looking for a 64bit big endian box and a 32bit big endian box that support unaligned memory.
|
msg201441 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-27 10:34 |
> I'm still looking for a 64bit big endian box
Have you tried the PPC64 PowerLinux box? It's in the stable buildbots for a reason :-)
|
msg201448 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-27 11:46 |
I can no longer find the configuration for custom path. It's still documented but there is no field for "repo path".
http://buildbot.python.org/all/buildslaves/edelsohn-powerlinux-ppc64
|
msg201449 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-27 11:50 |
> I can no longer find the configuration for custom path. It's still documented but there is no field for "repo path".
http://buildbot.python.org/all/builders/PPC64%20PowerLinux%20custom
(usually, just replace "3.x" with "custom" in the URL)
|
msg201459 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-27 13:46 |
Nick, please review the latest patch. I have addressed Antoine's review in 257597d20fa8.diff. I'll update the PEP as soon as you are happy with the patch.
|
msg201465 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-27 15:37 |
I suggest move to Include/pyhash.h and Python/pyhash.c only _Py_HashBytes() and string hash algorithm related constants, and don't touch PyObject_Hash(), _Py_HashDouble(), etc. So if anybody want change string hashing algorithm, it need only replace these two minimal files.
|
msg201466 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-27 15:46 |
PyObject_Hash() and PyObject_HashNotImplemented() should not have been moved to pyhash.h. But the other internal helper methods should be kept together.
|
msg201531 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-10-28 12:33 |
On 27 Oct 2013 23:46, "Christian Heimes" <report@bugs.python.org> wrote:
>
>
> Christian Heimes added the comment:
>
> Nick, please review the latest patch. I have addressed Antoine's review
in 257597d20fa8.diff. I'll update the PEP as soon as you are happy with the
patch.
Comments from the others sound promising, I should be able to take a look
myself tomorrow night.
|
msg201548 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-28 15:18 |
Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official build? I think that in custom build of Python whole files pyhash.c and pyhash.h can be replaced.
When you will get rid from PY_HASH_EXTERNAL, then you could get rid from PyHash_FuncDef, PyHash_Func, etc.
Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are hash algorithm agnostic, and it is unlikely they will be redefined in custom build.
You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or something like for exact 64 bit) in siphash24. On platforms where aligned access is required you will use per-bytes copy, otherwise you will use fast 64-bit copy.
|
msg201549 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 15:42 |
Am 28.10.2013 16:18, schrieb Serhiy Storchaka:
> Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official build? I think that in custom build of Python whole files pyhash.c and pyhash.h can be replaced.
Because you can't simple replace the files. It also contains
_Py_HashBytes() and _PyHash_Fini(). With PY_HASH_EXTERNAL embedders can
simply define PY_HASH_ALGORITHM PY_HASH_EXTERNAL and provide the extern
struct inside a separate object file.
> When you will get rid from PY_HASH_EXTERNAL, then you could get rid from PyHash_FuncDef, PyHash_Func, etc.
I don't understand why you want me to get rid of the struct. What's your
argument against the struct? I like the PyHash_FuncDef because it groups
all information (func ptr, name, hash metadata) in a single structure.
> Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are hash algorithm agnostic, and it is unlikely they will be redefined in custom build.
I have moved the functions to pyhash.c in order to keep all related
internal function in one file. They do not belong in Objects/object.c.
> You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or something like for exact 64 bit) in siphash24. On platforms where aligned access is required you will use per-bytes copy, otherwise you will use fast 64-bit copy.
I'm not going to make siphash24 compatible with platforms that require
aligned memory for integers. It's an unnecessary complication and
slow-down for all common platforms. The feature will simply not be
available on archaic architectures.
Seriously, nobody gives a ... about SPARC and MIPS. :) It's nice that
Python still works on these CPU architectures. But I neither want to
deviate from the SipHash24 implementation nor make the code slower on
all relevant platforms such as X86 and X86_64.
|
msg201550 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 15:55 |
Serhiy, I would like to land my patch before beta 1 hits the fan. We can always improve the code during beta. Right now I don't want to mess around with SipHash24 code. That includes non-64bit platforms as well as architectures that enforce aligned memory for integers.
|
msg201551 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-10-28 15:59 |
> Seriously, nobody gives a ... about SPARC and MIPS. :) It's nice that
> Python still works on these CPU architectures. But I neither want to
> deviate from the SipHash24 implementation nor make the code slower on
> all relevant platforms such as X86 and X86_64.
Well, unaligned memory access is usually slower on all architectures :-)
Also, I think some ARM architectures don't support unaligned access, so
it's not really a thing of the past...
|
msg201553 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 16:20 |
Am 28.10.2013 16:59, schrieb Charles-François Natali:
> Well, unaligned memory access is usually slower on all architectures :-)
> Also, I think some ARM architectures don't support unaligned access, so
> it's not really a thing of the past...
On modern computers it's either not slower or just a tiny bit slower.
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
Python's str and bytes datatype are always aligned properly. The
majority of bytearray and memoryview instances are aligned, too.
Unaligned memory access is rare case for most applications. About 50% of
strings have less than 8 bytes (!), 90% have less than 16 bytes. For the
Python's test suite the numbers are even smaller: ~45% <=5 bytes, ~90%
<=12 bytes.
You might see a 10% slowdown for very long and unaligned byte arrays on
some older CPUs. I think we can safely ignore the special case. Any
special case for unaligned memory will introduce additional overhead
that *will* slow down the common path.
Oh, and ARM has unaligned memory access:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFEJCBH.html
|
msg201554 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 17:00 |
I have added an optimization for hashing of small strings. It uses an inline version of DJBX33A for small strings [1, 7) on 64bit and [1, 5) on 32bit.
Nick, please use "create patch" before you do your review.
|
msg201555 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-28 17:15 |
> Because you can't simple replace the files.
Why not? This looks as simplest option when you build hard customized CPython.
> It also contains _Py_HashBytes() and _PyHash_Fini().
_PyHash_Fini() should be moved out too._Py_HashBytes() is only function which should be customized.
> I don't understand why you want me to get rid of the struct. What's your
> argument against the struct? I like the PyHash_FuncDef because it groups
> all information (func ptr, name, hash metadata) in a single structure.
Because it is redundant and only complicates the code, both use and declaration.
> > Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are
> > hash algorithm agnostic, and it is unlikely they will be redefined in
> > custom build.
> I have moved the functions to pyhash.c in order to keep all related
> internal function in one file. They do not belong in Objects/object.c.
There are other hash related functions (hashing integers, tuples). Only _Py_HashBytes() should be customized and only it worth moving to separated file.
> > You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or
> > something like for exact 64 bit) in siphash24. On platforms where aligned
> > access is required you will use per-bytes copy, otherwise you will use
> > fast 64-bit copy.
> I'm not going to make siphash24 compatible with platforms that require
> aligned memory for integers. It's an unnecessary complication and
> slow-down for all common platforms. The feature will simply not be
> available on archaic architectures.
The benefit is that the code will be simpler if get rid from HAVE_ALIGNED_REQUIRED and related code in ./configure. Only on such archaic architectures hash code will be slower.
> Serhiy, I would like to land my patch before beta 1 hits the fan. We can
> always improve the code during beta. Right now I don't want to mess around
> with SipHash24 code. That includes non-64bit platforms as well as
> architectures that enforce aligned memory for integers.
Let first land simplified patch and then you could add features such as PY_HASH_EXTERNAL and PyHash_FuncDef.
|
msg201561 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-10-28 17:52 |
>> Well, unaligned memory access is usually slower on all architectures :-)
>> Also, I think some ARM architectures don't support unaligned access, so
>> it's not really a thing of the past...
>
> On modern computers it's either not slower or just a tiny bit slower.
> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
I have other benchmarks that show slowdowns of more than 40%:
http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
Also, x86 has optimized unaligned memory accesses, but the world isn't
x86-only (once again, there's ARM, and AFAICT the performance hit can
be quite high).
Now, I perfectly understand that you don't want to mess with the
implementation, but just don't say that "unaligned access doesn't
matter, and is just a tiny bit slower".
IMO the compile-time check is enough.
|
msg201563 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 17:58 |
Am 28.10.2013 18:15, schrieb Serhiy Storchaka:
> _PyHash_Fini() should be moved out too._Py_HashBytes() is only function which should be customized.
You still haven' convinced me to scatter hash-related functions over
multiple C files. And it won't work with a static definition of the hash
function and PyHash_FuncDef. They MUST be in one object file.
> Because it is redundant and only complicates the code, both use and declaration.
No, I disagree with you. It makes the code less complicated because it
encapsulates all related data in one place. Please provide a patch that
shows the contrary.
> There are other hash related functions (hashing integers, tuples). Only _Py_HashBytes() should be customized and only it worth moving to separated file.
Python doesn't have a hash function for integers anymore. It has a
specialized function for PyLongObject. pyhash.c contains common helper
functions.
> The benefit is that the code will be simpler if get rid from HAVE_ALIGNED_REQUIRED and related code in ./configure. Only on such archaic architectures hash code will be slower.
And it will clutter other code... Please provide a patch and benchmarks
for your proposal. I'll incorporate your patch if it have zero impact on
speed and doesn't make the code harder to understand.
> Let first land simplified patch and then you could add features such as PY_HASH_EXTERNAL and PyHash_FuncDef.
No, I'm not going to remove code in order to re-add it later. If you
don't like my PEP then please provide patches.
|
msg201564 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-28 18:08 |
The code in your example uses volatile. That prevents lots of compiler optimizations. In my experience compilers and CPU do a better optimization job than humans until the human factor interferes with the compiler. Even 40% might not be slower than calling memcpy() for every block or processing the input byte by byte instead of uint64 by uint64...
I can't comment on ARM and Barry's ARM box is dead at the moment. Distributors or users can select different and more ARM-friendly code, too. After all the hash code is easily interchangeable. :)
|
msg201566 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-28 18:13 |
I think Christian is right here. Hashing unaligned memory areas will happen quite rarely. It should work, but it doesn't have to be as fast as the common case.
|
msg201582 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-28 21:11 |
Here is simplified version of the patch.
19427e9cc500.diff:
22 files changed, 746 insertions(+), 211 deletions(-), 28 modifications(!)
19427e9cc500-simplified.diff:
21 files changed, 486 insertions(+), 67 deletions(-), 27 modifications(!)
|
msg201625 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-10-29 12:04 |
Christian's general approach looks fine to me - consolidating the "kind" hashes (i.e. byte sequences, numbers and pointers) into one place independent of any particular type implementation makes sense to me, and the clear abstraction of "What is a hash function?" from Python's point of view is a good thing for embedding purposes.
If you get the PEP updated accordingly, we should be able to get that formally accepted in fairly short order. (I had some other suggestions in the review, but they aren't relevant to accepting the PEP).
Regarding the concerns about a potential performance impact for unaligned memory access, I think that can be better assessed *after* the simpler patch is merged and it's easier for people to get hold of the new hash implementation. However, the PEP should mention the concern, and note it as something we will be keeping a close eye on during the beta period.
|
msg201626 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-10-29 12:09 |
+ The above copyright notice and this permission notice shall be included in
+ all copies or substantial portions of the Software.
You should copy the license into Doc/license.rst.
|
msg201629 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-10-29 12:18 |
To support Windows 32 bit, the following code in PC/pyconfig.h can be modified to use __int64 or _W64: see ssize_t definition below in the same file.
#ifndef PY_UINT64_T
#if SIZEOF_LONG_LONG == 8
#define HAVE_UINT64_T 1
#define PY_UINT64_T unsigned PY_LONG_LONG
#endif
#endif
|
msg201636 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-29 14:10 |
Victor:
I have added the licence to Doc/licence.rst and created a new ticket for PY_UINT64_T on Windows #19433.
Nick:
The memory layout of the hash secret is now documented. I have renamed the members to reflect their purpose, too. http://hg.python.org/features/pep-456/file/f0a7e606c2d0/Include/pyhash.h#l32
I'll update my PEP shortly and address the memory layout of _Py_HashSecret_t, the small string hashing optimization and performance/memory alignment.
|
msg201677 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-29 21:22 |
About memcpy(). Here is sample file. Compile it to assembler:
gcc -O2 -S -masm=intel fnv.c
With memcpy() main loop is compiled to:
.L3:
mov esi, DWORD PTR [ebx]
imul eax, eax, 1000003
add ebx, 4
xor eax, esi
sub ecx, 1
mov DWORD PTR [esp+24], esi
jne .L3
With per-byte copy it is compiled to:
.L3:
mov dl, BYTE PTR [ecx]
imul eax, eax, 1000003
sub ebp, 1
movzx ebx, BYTE PTR [ecx+1]
movzx edi, BYTE PTR [ecx+2]
movzx esi, BYTE PTR [ecx+3]
add ecx, 4
mov dh, bl
sal edi, 16
movzx edx, dx
sal esi, 24
or edx, edi
or edx, esi
xor eax, edx
cmp ebp, -1
jne .L3
|
msg201849 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-10-31 21:00 |
I had to add the conversion from LE to host endianess. The missing conversion was affecting and degrading hash value dispersion.
|
msg202799 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-13 23:27 |
Hi Nick,
I have updated the patch and the PEP text. The new version has small string hash optimization disabled. The other changes are mostly cleanup, reformatting and simplifications.
Can you please do a review so I can get the patch into 3.4 before beta1 is released?
|
msg202949 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-15 13:05 |
I reviewed the latest PEP text at http://www.python.org/dev/peps/pep-0456/
I'm almost prepared to accept the current version of the implementation, but there's one technical decision to be clarified and a few placeholders in the PEP that need to be cleaned up prior to formal acceptance:
* The rationale for turning off the small string optimisation by default rather than setting the cutoff to 7 bytes isn't at all clear to me. A consistent 3-5% speed difference on the benchmark suite isn't trivial, and if we have the small string optimization off by default, why aren't we just deleting that code instead?
* A link to the benchmark suite at http://hg.python.org/benchmarks should be included at the appropriate places in the PEP
* The "Further things to consider" section needs to be moved to a paragraph under "Discussion" describing the current implementation (i.e. the hash equivalence is tolerated for simplicity and consistency)
* The "TBD" in the performance section needs to go. Reference should be made to the numbers in the small string optimisation section.
* The performance numbers need to be clear on what version of the feature branch was used to obtain them (preferably the one you plan to commit!).
|
msg203003 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-16 02:06 |
Here are benchmarks on two Linux machine. It looks like SipHash24 takes advantage of newer CPUs. I'm a bit puzzled about the results. Or maybe my super simple and naive analyzer doesn't give sensible results...
https://bitbucket.org/tiran/pep-456-benchmarks/
|
msg203019 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2013-11-16 10:24 |
Thanks - are those numbers with the current feature branch, and hence no small string optimization?
To be completely clear, I'm happy to accept a performance penalty to fix the hash algorithm. I'd just like to know exactly how big a penalty I'm accepting, and whether taking advantage of the small string optimization makes it measurably smaller.
|
msg203024 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-11-16 11:01 |
For the record, it's better to use a geometric mean when agregating benchmark results into a single score.
|
msg203048 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-11-16 14:54 |
So, amusingly, Christian's patch seems to be 4-5% faster than vanilla on many benchmarks here (Sandy Bridge Core i5, 64-bit, gcc 4.8.1). A couple of benchmarks are a couple % slower, but nothing severe.
This without the small strings optimization.
On top of that, the small strings optimization seems to have varying effects, some positive, some negative, so it doesn't seem to be desirable (on this platform anyway).
|
msg203050 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-11-16 14:56 |
Benchmark report (without the small strings optimization):
http://bpaste.net/show/UohtA8dmSREbrtsJYfTI/
|
msg203066 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-16 17:12 |
The numbers are between cpython default tip and my feature branch. I have pulled and merged all upstream changes into my feature branch yesterday. The results with "sso" in the file name are with small string optimization.
Performance greatly depends on compiler and CPU. In general SipHash24 is faster on modern compilers and modern CPUs. MSVC generates slower code for SipHash24 but I haven't optimized the code for MSVC yet. Perhaps Serhiy is able to do his magic. :)
PS: I have uploaded more benchmarks. The directories contain the raw output and CSV files.
|
msg203457 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-20 01:51 |
The PEP should be ready now. I have addressed your input in http://hg.python.org/peps/rev/fbe779221a7a
|
msg203466 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-20 10:46 |
New changeset adb471b9cba1 by Christian Heimes in branch 'default':
ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'.
http://hg.python.org/cpython/rev/adb471b9cba1
|
msg203468 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-20 11:00 |
New changeset 422ed27b62ce by Christian Heimes in branch 'default':
Issue #19183: test_gdb's test_dict was failing on some machines as the order or dict keys has changed again.
http://hg.python.org/cpython/rev/422ed27b62ce
|
msg203469 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-20 11:28 |
New changeset 11cb1c8faf11 by Victor Stinner in branch 'default':
Issue #19183: Fix repr() tests of test_gdb, hash() is now platform dependent
http://hg.python.org/cpython/rev/11cb1c8faf11
|
msg203471 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-20 11:48 |
Not only test_gdb relies on repr() exact value, there is also test_functools:
http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.x/builds/6875/steps/test/logs/stdio
======================================================================
FAIL: test_repr (test.test_functools.TestPartialC)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_functools.py", line 174, in test_repr
repr(f))
AssertionError: 'func[51 chars]88>, b=<object object at 0xffffdd7fff790440>, [36 chars]00>)' != 'func[51 chars]88>, a=<object object at 0xffffdd7fff790400>, [36 chars]40>)'
- functools.partial(<function capture at 0xffffdd7ffe64d788>, b=<object object at 0xffffdd7fff790440>, a=<object object at 0xffffdd7fff790400>)
+ functools.partial(<function capture at 0xffffdd7ffe64d788>, a=<object object at 0xffffdd7fff790400>, b=<object object at 0xffffdd7fff790440>)
======================================================================
FAIL: test_repr (test.test_functools.TestPartialCSubclass)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_functools.py", line 174, in test_repr
repr(f))
AssertionError: 'Part[49 chars]88>, b=<object object at 0xffffdd7fff790540>, [36 chars]00>)' != 'Part[49 chars]88>, a=<object object at 0xffffdd7fff790500>, [36 chars]40>)'
- PartialSubclass(<function capture at 0xffffdd7ffe64d788>, b=<object object at 0xffffdd7fff790540>, a=<object object at 0xffffdd7fff790500>)
+ PartialSubclass(<function capture at 0xffffdd7ffe64d788>, a=<object object at 0xffffdd7fff790500>, b=<object object at 0xffffdd7fff790540>)
|
msg203472 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-20 11:49 |
New changeset 961d832d8734 by Christian Heimes in branch 'default':
Issue #19183: too many tests depend on the sort order of repr().
http://hg.python.org/cpython/rev/961d832d8734
|
msg203503 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2013-11-20 16:42 |
The problems have been resolved.
|
msg203595 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-11-21 09:29 |
New changeset eec4758e3a45 by Victor Stinner in branch 'default':
Issue #19183: Simplify test_gdb
http://hg.python.org/cpython/rev/eec4758e3a45
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:51 | admin | set | github: 63382 |
2013-11-21 09:29:51 | python-dev | set | messages:
+ msg203595 |
2013-11-20 16:42:15 | christian.heimes | set | status: open -> closed resolution: fixed messages:
+ msg203503
|
2013-11-20 11:49:13 | python-dev | set | messages:
+ msg203472 |
2013-11-20 11:48:33 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages:
+ msg203471
|
2013-11-20 11:28:22 | python-dev | set | messages:
+ msg203469 |
2013-11-20 11:03:47 | christian.heimes | set | status: open -> closed assignee: ncoghlan -> christian.heimes resolution: fixed stage: patch review -> resolved |
2013-11-20 11:00:44 | python-dev | set | messages:
+ msg203468 |
2013-11-20 10:46:31 | python-dev | set | messages:
+ msg203466 |
2013-11-20 01:51:15 | christian.heimes | set | assignee: christian.heimes -> ncoghlan messages:
+ msg203457 |
2013-11-16 17:12:59 | christian.heimes | set | messages:
+ msg203066 |
2013-11-16 14:56:29 | pitrou | set | messages:
+ msg203050 |
2013-11-16 14:54:28 | pitrou | set | messages:
+ msg203048 |
2013-11-16 11:01:05 | pitrou | set | messages:
+ msg203024 |
2013-11-16 10:24:39 | ncoghlan | set | messages:
+ msg203019 |
2013-11-16 02:06:45 | christian.heimes | set | messages:
+ msg203003 |
2013-11-15 14:42:46 | ncoghlan | set | assignee: ncoghlan -> christian.heimes |
2013-11-15 13:05:50 | ncoghlan | set | messages:
+ msg202949 |
2013-11-13 23:27:44 | christian.heimes | set | messages:
+ msg202799 |
2013-11-13 23:24:33 | christian.heimes | set | files:
+ ac521cef665a.diff |
2013-10-31 21:00:00 | christian.heimes | set | messages:
+ msg201849 |
2013-10-31 20:29:14 | christian.heimes | set | files:
+ fb2f9c0bbca9.diff |
2013-10-29 22:41:39 | christian.heimes | set | files:
+ 4756e9ed0328.diff |
2013-10-29 21:22:12 | serhiy.storchaka | set | messages:
+ msg201677 |
2013-10-29 21:21:45 | serhiy.storchaka | set | messages:
- msg201675 |
2013-10-29 21:20:05 | serhiy.storchaka | set | files:
+ fnv.c
messages:
+ msg201675 |
2013-10-29 14:10:08 | christian.heimes | set | messages:
+ msg201636 |
2013-10-29 12:18:13 | vstinner | set | messages:
+ msg201629 |
2013-10-29 12:09:48 | vstinner | set | nosy:
+ vstinner messages:
+ msg201626
|
2013-10-29 12:04:10 | ncoghlan | set | messages:
+ msg201625 |
2013-10-28 23:05:46 | Arfrever | set | nosy:
+ Arfrever
|
2013-10-28 21:11:30 | serhiy.storchaka | set | files:
+ 19427e9cc500-simplified.diff
messages:
+ msg201582 |
2013-10-28 19:24:52 | pitrou | set | files:
+ b8d39bf9ca4a.diff |
2013-10-28 18:13:58 | pitrou | set | messages:
+ msg201566 |
2013-10-28 18:08:21 | christian.heimes | set | messages:
+ msg201564 |
2013-10-28 17:58:32 | christian.heimes | set | messages:
+ msg201563 |
2013-10-28 17:52:52 | neologix | set | messages:
+ msg201561 |
2013-10-28 17:15:37 | serhiy.storchaka | set | messages:
+ msg201555 |
2013-10-28 17:00:31 | christian.heimes | set | messages:
+ msg201554 |
2013-10-28 16:20:10 | christian.heimes | set | messages:
+ msg201553 |
2013-10-28 15:59:55 | neologix | set | nosy:
+ neologix messages:
+ msg201551
|
2013-10-28 15:55:34 | christian.heimes | set | messages:
+ msg201550 |
2013-10-28 15:42:29 | christian.heimes | set | messages:
+ msg201549 |
2013-10-28 15:18:09 | serhiy.storchaka | set | messages:
+ msg201548 |
2013-10-28 12:33:35 | ncoghlan | set | messages:
+ msg201531 |
2013-10-28 11:58:03 | serhiy.storchaka | link | issue16427 superseder |
2013-10-27 20:57:26 | christian.heimes | set | files:
+ 19427e9cc500.diff |
2013-10-27 15:46:47 | christian.heimes | set | messages:
+ msg201466 |
2013-10-27 15:37:55 | serhiy.storchaka | set | messages:
+ msg201465 |
2013-10-27 13:46:03 | christian.heimes | set | assignee: ncoghlan messages:
+ msg201459 |
2013-10-27 13:33:17 | christian.heimes | set | files:
+ 257597d20fa8.diff |
2013-10-27 11:50:05 | pitrou | set | messages:
+ msg201449 |
2013-10-27 11:46:43 | christian.heimes | set | messages:
+ msg201448 |
2013-10-27 10:34:39 | pitrou | set | messages:
+ msg201441 |
2013-10-27 01:52:54 | christian.heimes | set | messages:
+ msg201405 |
2013-10-27 01:47:28 | christian.heimes | set | files:
+ 31ce9488be1c.diff |
2013-10-27 01:47:01 | christian.heimes | set | files:
- 38b3ad4287ef.diff |
2013-10-25 22:45:49 | christian.heimes | set | files:
+ 38b3ad4287ef.diff |
2013-10-24 22:40:37 | christian.heimes | set | hgrepos:
+ hgrepo212 messages:
+ msg201193 |
2013-10-14 13:07:22 | ncoghlan | set | messages:
+ msg199884 |
2013-10-07 15:38:54 | serhiy.storchaka | set | messages:
+ msg199149 |
2013-10-07 15:32:19 | serhiy.storchaka | set | messages:
+ msg199148 |
2013-10-07 15:28:00 | christian.heimes | set | messages:
+ msg199147 |
2013-10-07 15:12:26 | serhiy.storchaka | set | messages:
+ msg199146 |
2013-10-07 14:54:10 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg199145
|
2013-10-07 13:20:53 | python-dev | set | nosy:
+ python-dev messages:
+ msg199144
|
2013-10-07 11:57:59 | christian.heimes | set | messages:
+ msg199142 |
2013-10-07 11:12:12 | pitrou | set | messages:
+ msg199139 |
2013-10-07 10:57:36 | christian.heimes | set | messages:
+ msg199137 |
2013-10-06 19:28:09 | pitrou | set | messages:
+ msg199114 |
2013-10-06 19:23:52 | pitrou | set | messages:
+ msg199113 |
2013-10-06 14:58:27 | christian.heimes | create | |