classification
Title: Optimize base64.b16decode to use compiled regex
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: xtreak Nosy List: barry, djhoulihan, pitrou, scoder, serhiy.storchaka, xtreak
Priority: normal Keywords: patch

Created on 2018-12-22 07:29 by xtreak, last changed 2018-12-27 05:32 by xtreak. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11287 closed cheryl.sabella, 2018-12-22 21:22
Messages (10)
msg332330 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-22 07:29
I came across this as a result of issue35557 and thought to make a new issue to keep the discussion separate. Currently the b16decode function uses a regex with re.search that can be compiled at the module level as a static variable to give up to 30% improvement when executed on Python 3.7. I am proposing a PR for this change since it looks safe to me.

$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark          | default | optimized                    |
+====================+=========+==============================+
| b16decode          | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+

Benchmark script : 

import perf
import re
import binascii
import base64

_B16DECODE_PAT = re.compile(b'[^0-9A-F]')

def b16decode_re_compiled_search(s, casefold=False):
    s = base64._bytes_from_decode_data(s)
    if casefold:
        s = s.upper()
    if _B16DECODE_PAT.search(s):
        raise binascii.Error('Non-base16 digit found')
    return binascii.unhexlify(s)

if __name__ == "__main__":
    hex_data = "806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
    hex_data_upper = hex_data.upper()

    assert base64.b16decode(hex_data_upper) == b16decode_re_compiled_search(hex_data_upper)
    assert base64.b16decode(hex_data, casefold=True) == b16decode_re_compiled_search(hex_data, casefold=True)

    runner = perf.Runner()
    if True: # toggle to False for default.json
        runner.timeit(name="b16decode",
                      stmt="b16decode_re_compiled_search(hex_data_upper)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
        runner.timeit(name="b16decode_casefold",
                      stmt="b16decode_re_compiled_search(hex_data, casefold=True)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
    else:
        runner.timeit(name="b16decode",
                      stmt="base64.b16decode(hex_data_upper)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")
        runner.timeit(name="b16decode_casefold",
                      stmt="base64.b16decode(hex_data, casefold=True)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")
msg332331 - (view) Author: Stefan Behnel (scoder) * Date: 2018-12-22 07:42
One regex related code pattern that I generally like is to assign bound methods to good names and use those. In this case, I would write

_has_non_base16_digits = re.compile(b'[^0-9A-F]').search
...
if _has_non_base16_digits(s):
    raise ...
msg332333 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-22 07:55
How this affects the import time (use -X importtime)?
msg332334 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-22 08:03
Thanks @scoder . I took the convention since most of the places I have seen capitalized variable ending with PAT but this looks more readable to me. I have made the changes in my PR.
msg332335 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-22 08:10
> How this affects the import time (use -X importtime)?

Is there reliable way to benchmark this?

On multiple runs with regex for python3 -X importtime -c 'import base64'

import time:       677 |      11151 | base64

On multiple runs without regex for python3 -X importtime -c 'import base64'

import time:       551 |       5726 | base64
msg332339 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-22 08:55
According to your results, you will save 0.94 us per call of b16decode, but loss 126 us at import time. You will get a benefit if b16decode is used more than 130 time.

Currently, if the performance is critical, the user can use binascii.unhexlify() directly or use bytes.fromhex(). base64.b16decode() is only needed if you need to ensure that the input strictly conforms to RFC 3548.

Other options, which will maximize the performance while keeping the validation is to add an option to binascii.unhexlify() for making it more strict. At the time of writing the base64 module its performance was not considered critical, and additional checks and preprocessing was implemented in Python.
msg332341 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-22 09:20
Thanks Serhiy, the other issue noted about performance improvement removing casefold and I thought re.search per call to be inefficient. My bad that I didn't consider the cost of moving the compilation to module level that affects import time and about using -X importtime. I agree that the cost is not worthy given that regex is used only inside b16decode. I will keep these factors in mind when I am doing similar sort of work and try to do a better analysis.
msg332372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-12-22 21:39
I don't think performance of base16 encoding is important enough to increase import times.  I would recommend rejection.
msg332563 - (view) Author: Stefan Behnel (scoder) * Date: 2018-12-26 21:36
I agree with Antoine. After all, we are optimising a safety check here that runs in linear time. If people want speed, they should consider methods that do not do this check in the first place.
msg332574 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-27 05:32
Thanks for the feedback. I am closing this as rejected since it's not worth the cost of increasing import time and for performance reasons there are other options as Serhiy noted in msg332339.
History
Date User Action Args
2018-12-27 05:32:23xtreaksetstatus: open -> closed
resolution: rejected
messages: + msg332574

stage: patch review -> resolved
2018-12-26 21:36:07scodersetmessages: + msg332563
2018-12-22 21:39:32pitrousetmessages: + msg332372
2018-12-22 21:22:49cheryl.sabellasetpull_requests: + pull_request10517
2018-12-22 21:22:22cheryl.sabellasetpull_requests: - pull_request10516
2018-12-22 21:21:40cheryl.sabellasetkeywords: + patch
stage: patch review
pull_requests: + pull_request10516
2018-12-22 09:20:04xtreaksetmessages: + msg332341
2018-12-22 08:55:41serhiy.storchakasetnosy: + barry, pitrou
messages: + msg332339
2018-12-22 08:10:28xtreaksetmessages: + msg332335
2018-12-22 08:03:08xtreaksetmessages: + msg332334
2018-12-22 07:55:19serhiy.storchakasetmessages: + msg332333
2018-12-22 07:42:06scodersetnosy: + scoder
messages: + msg332331
2018-12-22 07:29:08xtreakcreate