Author xtreak
Recipients djhoulihan, serhiy.storchaka, xtreak
Date 2018-12-22.07:29:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1545463748.16.0.98272194251.issue35559@roundup.psfhosted.org>
In-reply-to
Content
I came across this as a result of issue35557 and thought to make a new issue to keep the discussion separate. Currently the b16decode function uses a regex with re.search that can be compiled at the module level as a static variable to give up to 30% improvement when executed on Python 3.7. I am proposing a PR for this change since it looks safe to me.

$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark          | default | optimized                    |
+====================+=========+==============================+
| b16decode          | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+

Benchmark script : 

import perf
import re
import binascii
import base64

_B16DECODE_PAT = re.compile(b'[^0-9A-F]')

def b16decode_re_compiled_search(s, casefold=False):
    s = base64._bytes_from_decode_data(s)
    if casefold:
        s = s.upper()
    if _B16DECODE_PAT.search(s):
        raise binascii.Error('Non-base16 digit found')
    return binascii.unhexlify(s)

if __name__ == "__main__":
    hex_data = "806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
    hex_data_upper = hex_data.upper()

    assert base64.b16decode(hex_data_upper) == b16decode_re_compiled_search(hex_data_upper)
    assert base64.b16decode(hex_data, casefold=True) == b16decode_re_compiled_search(hex_data, casefold=True)

    runner = perf.Runner()
    if True: # toggle to False for default.json
        runner.timeit(name="b16decode",
                      stmt="b16decode_re_compiled_search(hex_data_upper)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
        runner.timeit(name="b16decode_casefold",
                      stmt="b16decode_re_compiled_search(hex_data, casefold=True)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
    else:
        runner.timeit(name="b16decode",
                      stmt="base64.b16decode(hex_data_upper)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")
        runner.timeit(name="b16decode_casefold",
                      stmt="base64.b16decode(hex_data, casefold=True)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")
History
Date User Action Args
2018-12-22 07:29:11xtreaksetrecipients: + xtreak, serhiy.storchaka, djhoulihan
2018-12-22 07:29:08xtreaksetmessageid: <1545463748.16.0.98272194251.issue35559@roundup.psfhosted.org>
2018-12-22 07:29:08xtreaklinkissue35559 messages
2018-12-22 07:29:08xtreakcreate