Message 397917 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	idan22moral
Recipients	idan22moral
Date	2021-07-20.21:57:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1626818256.45.0.373140812425.issue44690@roundup.psfhosted.org>
In-reply-to

Content
This is a follow-up PR to GH-24402. Currently, base64.b64decode uses a generic regex to validate s (when validate is true), which sometimes results in unexpected behavior and exception messages. Example: (1) base64.b64decode('ab==', validate=True) # b'i' (2) base64.b64decode('ab3==', validate=True) # b'i\xbd' (3) base64.b64decode('ab=3=', validate=True) # raises binascii.Error: Non-base64 digit found (4) base64.b64decode('ab==3', validate=True) # raises binascii.Error: Non-base64 digit found (5) base64.b64decode('ab===', validate=True) # raises binascii.Error: Non-base64 digit found (6) base64.b64decode('=ab==', validate=True) # raises binascii.Error: Non-base64 digit found The only strict-base64 valid example here is (1). (2), (4) and (5) should raise 'Excess data after padding', (3) should raise 'Discontinuous padding not allowed', and (6) should raise 'Leading padding not allowed'. To get this behavior, we can use the new (at the time of creating this PR) binascii.a2b_base64 functionality of strict mode. I have one (not so big) concern - efficiency. I'm not that experienced with how fast regex-es are (in Python or in general) compared to the implementation of binascii.a2b_base64 in C. So, I've no idea what would be the impact of migrating from regex pre-validation to input parsing. Let me know if you find it inefficient. ----- Referenced issue (GH-24402): https://bugs.python.org/issue43086

This is a follow-up PR to GH-24402.

Currently, *base64.b64decode* uses a generic regex to validate *s* (when *validate* is true),
which sometimes results in unexpected behavior and exception messages.

Example:

(1)    base64.b64decode('ab==',  validate=True) # b'i'
(2)    base64.b64decode('ab3==', validate=True) # b'i\xbd'
(3)    base64.b64decode('ab=3=', validate=True) # raises binascii.Error: Non-base64 digit found
(4)    base64.b64decode('ab==3', validate=True) # raises binascii.Error: Non-base64 digit found
(5)    base64.b64decode('ab===', validate=True) # raises binascii.Error: Non-base64 digit found
(6)    base64.b64decode('=ab==', validate=True) # raises binascii.Error: Non-base64 digit found

The only strict-base64 valid example here is (1).
(2), (4) and (5) should raise 'Excess data after padding',
(3) should raise 'Discontinuous padding not allowed',
and (6) should raise 'Leading padding not allowed'.

To get this behavior, we can use the new (at the time of creating this PR) *binascii.a2b_base64* functionality of strict mode.

I have one (not so big) concern - efficiency.
I'm not that experienced with how fast regex-es are (in Python or in general) compared to the implementation of *binascii.a2b_base64* in C.
So, I've no idea what would be the impact of migrating from regex pre-validation to input parsing.
Let me know if you find it inefficient.

-----

Referenced issue (GH-24402): https://bugs.python.org/issue43086

History
Date	User	Action	Args
2021-07-20 21:57:36	idan22moral	set	recipients: + idan22moral
2021-07-20 21:57:36	idan22moral	set	messageid: <1626818256.45.0.373140812425.issue44690@roundup.psfhosted.org>
2021-07-20 21:57:36	idan22moral	link	issue44690 messages
2021-07-20 21:57:36	idan22moral	create