Message332330
I came across this as a result of issue35557 and thought to make a new issue to keep the discussion separate. Currently the b16decode function uses a regex with re.search that can be compiled at the module level as a static variable to give up to 30% improvement when executed on Python 3.7. I am proposing a PR for this change since it looks safe to me.
$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark | default | optimized |
+====================+=========+==============================+
| b16decode | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+
Benchmark script :
import perf
import re
import binascii
import base64
_B16DECODE_PAT = re.compile(b'[^0-9A-F]')
def b16decode_re_compiled_search(s, casefold=False):
s = base64._bytes_from_decode_data(s)
if casefold:
s = s.upper()
if _B16DECODE_PAT.search(s):
raise binascii.Error('Non-base16 digit found')
return binascii.unhexlify(s)
if __name__ == "__main__":
hex_data = "806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
hex_data_upper = hex_data.upper()
assert base64.b16decode(hex_data_upper) == b16decode_re_compiled_search(hex_data_upper)
assert base64.b16decode(hex_data, casefold=True) == b16decode_re_compiled_search(hex_data, casefold=True)
runner = perf.Runner()
if True: # toggle to False for default.json
runner.timeit(name="b16decode",
stmt="b16decode_re_compiled_search(hex_data_upper)",
setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
runner.timeit(name="b16decode_casefold",
stmt="b16decode_re_compiled_search(hex_data, casefold=True)",
setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
else:
runner.timeit(name="b16decode",
stmt="base64.b16decode(hex_data_upper)",
setup="from __main__ import hex_data, hex_data_upper; import base64")
runner.timeit(name="b16decode_casefold",
stmt="base64.b16decode(hex_data, casefold=True)",
setup="from __main__ import hex_data, hex_data_upper; import base64") |
|
Date |
User |
Action |
Args |
2018-12-22 07:29:11 | xtreak | set | recipients:
+ xtreak, serhiy.storchaka, djhoulihan |
2018-12-22 07:29:08 | xtreak | set | messageid: <1545463748.16.0.98272194251.issue35559@roundup.psfhosted.org> |
2018-12-22 07:29:08 | xtreak | link | issue35559 messages |
2018-12-22 07:29:08 | xtreak | create | |
|