classification
Title: Adopt binacii.a2b_base64's strict mode in base64.b64decode
Type: behavior Stage: commit review
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, idan22moral
Priority: normal Keywords: patch

Created on 2021-07-20 21:57 by idan22moral, last changed 2021-08-23 23:45 by gregory.p.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 27272 merged idan22moral, 2021-07-20 22:01
Messages (3)
msg397917 - (view) Author: Idan Moral (idan22moral) * Date: 2021-07-20 21:57
This is a follow-up PR to GH-24402.

Currently, *base64.b64decode* uses a generic regex to validate *s* (when *validate* is true),
which sometimes results in unexpected behavior and exception messages.

Example:

(1)    base64.b64decode('ab==',  validate=True) # b'i'
(2)    base64.b64decode('ab3==', validate=True) # b'i\xbd'
(3)    base64.b64decode('ab=3=', validate=True) # raises binascii.Error: Non-base64 digit found
(4)    base64.b64decode('ab==3', validate=True) # raises binascii.Error: Non-base64 digit found
(5)    base64.b64decode('ab===', validate=True) # raises binascii.Error: Non-base64 digit found
(6)    base64.b64decode('=ab==', validate=True) # raises binascii.Error: Non-base64 digit found

The only strict-base64 valid example here is (1).
(2), (4) and (5) should raise 'Excess data after padding',
(3) should raise 'Discontinuous padding not allowed',
and (6) should raise 'Leading padding not allowed'.

To get this behavior, we can use the new (at the time of creating this PR) *binascii.a2b_base64* functionality of strict mode.

I have one (not so big) concern - efficiency.
I'm not that experienced with how fast regex-es are (in Python or in general) compared to the implementation of *binascii.a2b_base64* in C.
So, I've no idea what would be the impact of migrating from regex pre-validation to input parsing.
Let me know if you find it inefficient.

-----

Referenced issue (GH-24402): https://bugs.python.org/issue43086
msg400184 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-08-23 23:43
I'm not worried about the regex vs binascii C implementation performance at all.
msg400185 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-08-23 23:44
New changeset fa6304a5225787054067bb56089632146d288b20 by Idan Moral in branch 'main':
bpo-44690: Adopt binacii.a2b_base64's strict mode in base64.b64decode (GH-27272)
https://github.com/python/cpython/commit/fa6304a5225787054067bb56089632146d288b20
History
Date User Action Args
2021-08-23 23:45:16gregory.p.smithsetstatus: open -> closed
resolution: fixed
stage: patch review -> commit review
2021-08-23 23:44:36gregory.p.smithsetmessages: + msg400185
2021-08-23 23:43:13gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg400184
2021-07-20 22:01:11idan22moralsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25817
2021-07-20 21:57:36idan22moralcreate