This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author xtreak
Recipients djhoulihan, xtreak
Date 2018-12-22.06:31:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1545460311.81.0.98272194251.issue35557@roundup.psfhosted.org>
In-reply-to
Content
Thanks for the report. A couple of points as below : 

* This changes the interface of the function by removing a parameter. Thus it will break compatibility with Python 2 and also earlier versions of Python 3. Removing a parameter in the signature has to go through a deprecation cycle if this is going to be accepted.
* Please don't use time.time and mean for benchmarks that can be misleading. There are modules like timeit and perf (https://pypi.org/project/perf/) that are more reliable.

I looked for some more inefficiencies and I can see re.search for every run. Perhaps re.compile can be used to store the compiled regex at module level and then to match against the string. This makes the function 25% faster without changing the interface. In case casefold=False then an extra call to make the string upper case is avoided giving some more benefit.

With re.search inside the function

$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 3.08 us +- 0.22 us
$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()' 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 2.93 us +- 0.20 us

With the regex compiled to a variable at the module level

$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 2.08 us +- 0.15 us
$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()' 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 1.98 us +- 0.17 us


Since this is a comparison of fixed set of elements I tried using a set of elements and any to short-circuit but it seems to be slower

$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 8.21 us +- 0.66 us


I am opening a PR to use the compiled regex at the module level since I see it as a net win of 25-30% without any interface change or test case changes required.
History
Date User Action Args
2018-12-22 06:31:52xtreaksetrecipients: + xtreak, djhoulihan
2018-12-22 06:31:51xtreaksetmessageid: <1545460311.81.0.98272194251.issue35557@roundup.psfhosted.org>
2018-12-22 06:31:51xtreaklinkissue35557 messages
2018-12-22 06:31:51xtreakcreate