This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author 2d4d
Recipients 2d4d
Date 2021-01-16.21:33:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1610832800.98.0.200773545993.issue42942@roundup.psfhosted.org>
In-reply-to
Content
Problem: hashlib only offers digest() and hexdigest() but the fastest way to work with hashes is as integer.

The first thing loki does after getting the hashes is to convert them to int:
md5, sha1, sha256 = generateHashes(fileData)
                        md5_num=int(md5, 16)
                        sha1_num=int(sha1, 16)
                        sha256_num=int(sha256, 16)
https://github.com/Neo23x0/Loki/blob/master/loki.py

All the ~50000 hashes to compare are also converted to int after reading them from a file. The comparison is about twice as fast compared to hexdigest in strings because it uses just half the memory. 

(The use case here is to compare these 50,000 hashes to the hashes of all the 200,000 files on a system that gets scanned for malicious files.)

Solution: Add decdigest() to hashlib which returns the int version of the hash. This has 2 advantages: 
1. It saves the time for converting the hash to hex and back
2. Having decdigest() in the documentation inspires more programmers to work with hashes as int opposed to slow strings (where it's performance relevant.)

Should be just few lines of code for each algorithm, I could do the PR.

static PyObject *
_sha3_shake_128_hexdigest(SHA3object *self, PyObject *arg)
{
    PyObject *return_value = NULL;
    unsigned long length;

    if (!_PyLong_UnsignedLong_Converter(arg, &length)) {
        goto exit;
    }
    return_value = _sha3_shake_128_hexdigest_impl(self, length);

https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Modules/_sha3/clinic/sha3module.c.h
History
Date User Action Args
2021-01-16 21:33:212d4dsetrecipients: + 2d4d
2021-01-16 21:33:202d4dsetmessageid: <1610832800.98.0.200773545993.issue42942@roundup.psfhosted.org>
2021-01-16 21:33:202d4dlinkissue42942 messages
2021-01-16 21:33:202d4dcreate