Message410451
I was about to say the only missing feature was docstrings, but then I realized I hadn't included releasing the GIL. I've added that and pushed an update just now. Fingers crossed there's nothing else I've missed. I think it's in reasonably good shape, and I'd like to propose it for inclusion in 3.11.
However, I'm not very experienced with setup.py or the Python C API, so I might not be the best judge of shape. Here are some highlights for reviewers, where I think the implementation (mostly the build) could be shaky:
- Platform detection. In setup.py I assume that the target platform of the build is the same as the current interpreter's platform. So for example, if the current interpreter has sys.maxsize==2**31-1, I assume we're compiling for 32-bit. This clearly doesn't allow for any sort of cross-compilation, and in general I need feedback about whether there's a more correct way to obtain the target platform.
- Compiling assembly files. On Unix this is easy, because we can supply *.S files as `extra_objects` and GCC/Clang will do the right thing. But on Windows this isn't easy. Currently I shell out to vswhere.exe, ask it for the path to the latest version of the ml64.exe assembler, and then shell out to that to build .obj files. Then I pass those assembled .obj files as `extra_objects`. This feels awfully manual, and there's a good chance I've missed some better-supported way to do it. I assume we don't want to check in the .obj files?
- Does Python support the GNU ABI on Windows? We have assembly files for this in vendor/, but I'm not currently building them.
- Compiling intrinsics files for 32-bit x86. In this case, I create a `ccompiler.new_compiler()` for each intrinsics file, so that I can set the appropriate flags for each. This is relatively clean, but it leads to things getting rebuilt every single time, rather than participating in `setup.py build` caching. Maybe nobody cares about this, but again it makes me think there might be a better-supported way to do it.
- blake3module.c contains an awful lot of gotos to handle allocation failure cases. Is this still considered a best practice? These are bug-prone, and I'm not sure how to test them.
- Existing hashlib implementations include an optimization where they avoid allocating an internal mutex until they see a long input and want to release the GIL. As a quirky side effect of this, they handle allocation failures for that mutex by falling back to the do-not-release-the-GIL codepath. That feels kind of complicated to me, and in my code I'm always allocating the mutex during initialization. This doesn't seem to make much of a performance difference when I measure it, but there might be use cases I haven't considered.
Here are some other API details that might be worth bikeshedding:
- The `max_threads` parameter is currently defined to take a special value, `blake3.AUTO`, to indicate that the implementation may use as many threads as it likes. (The C implementation doesn't include multithreading support, but it's API-compatible with the Rust implementation.) `blake3.AUTO` is currently a class attribute equal to -1. We may want to bikeshed this name or propose some other representation.
- BLAKE3 has three modes: regular hashing, keyed hashing, and key derivation. The keyed hashing mode takes a 32-byte key, and the key derivation mode takes a context string. Calling the 32-byte key `key` seems good. Calling the context string `context` seems less good. Larry has pointed out before that lots of random things are called `context`, and readers might not understand what they're seeing. I currently call it `derive_key_context` instead, but we could bikeshed this.
- I check `itemsize` on input Py_buffers and throw an exception if it's anything other than 1. My test suite exercises this, see `test_int_array_fails`. However, some (all?) standard hashes don't do this check. For example:
>>> from hashlib import sha256
>>> import array
>>> a = array.array("i", [255])
>>> sha256(a).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
>>> sha256(bytes([0xff, 0, 0, 0])).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
Here we can see sha256() hashing an array of int. On my machine, an int is 4 little-endian bytes, but of course this sort of thing isn't portable. The same array will result in a different SHA-256 output on a big-endian machine, or on a machine with ints of a different size. This seems undesirable, and I'm surprised that hashlib allows it. However, if there's some known compatibility reason why we have to allow it, I could remove this check. |
|
Date |
User |
Action |
Args |
2022-01-13 02:08:50 | oconnor663 | set | recipients:
+ oconnor663, larry, christian.heimes, mgorny, Zooko.Wilcox-O'Hearn, jstasiak, corona10, xtreak, kmaork |
2022-01-13 02:08:50 | oconnor663 | set | messageid: <1642039730.23.0.561605338661.issue39298@roundup.psfhosted.org> |
2022-01-13 02:08:50 | oconnor663 | link | issue39298 messages |
2022-01-13 02:08:49 | oconnor663 | create | |
|