This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Excess data in not handled properly in binascii.a2b_base64()
Type: security Stage: commit review
Components: Extension Modules, Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: eric.smith, gregory.p.smith, idan22moral, python-dev
Priority: normal Keywords: patch

Created on 2021-01-31 18:11 by idan22moral, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24402 merged python-dev, 2021-01-31 18:18
PR 27249 open idan22moral, 2021-07-19 18:37
Messages (3)
msg386032 - (view) Author: Idan Moral (idan22moral) * Date: 2021-01-31 18:11
Currently, when providing binascii.a2b_base64() base-64 input with excess data after the padding ('='/'=='), the excess data is ignored.

Example:

import binascii
binascii.a2b_base64(b'aGVsbG8=')       # b'hello' (valid)
binascii.a2b_base64(b'aGVsbG8==')      # b'hello' (ignoring data)
binascii.a2b_base64(b'aGVsbG8=python') # b'hello' (ignoring data)


Note: MANY libraries (such as the all-time favorite `base64`) use this function as their decoder.


Why is it problematic:
* User input can contain additional data after base64 data, which can lead to unintended behavior in products.
* Well-crafted user input can be used to bypass conditions in code (example in the referenced tweet).
* Can be used to target vulnerable libraries and bypass authentication mechanism such as JWT (potentially).


The logic behind my fix PR on GitHub:
* Before deciding to finish the function (after knowing the fact that we passed the data padding),
  we should check if there's no more data after the padding.
* If excess data exists, we should raise an error, free the allocated writer, and return null.
* Else, everything's fine, and we can proceed to the function's end as previously.


Though not publicly disclosed, this behavior can lead to security issues in heavily-used projects.
Preventing this behavior sounds more beneficial than harmful, since there's no known good usage for this behavior.

From what I read, the python implementation in not so close (when speaking about this case of course) to the base64 RFC.
(link: https://tools.ietf.org/html/rfc4648#section-3.3)


Thanks to Ori Damari (twitter: https://twitter.com/0xrepnz) for bringing this behavior up,
and thanks to Ryan Mast (twitter: https://twitter.com/rmast), and many of the other great guys for discussing the problem in the comments.

Link to the tweet: https://twitter.com/0xrepnz/status/1355295649915404291

--------------------------

Idan Moral
Twitter: https://twitter.com/idan_moral
GitHub: https://github.com/idan22moral
msg397772 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-07-19 00:45
New changeset 35b98e38b6edd63153fc8e092f94cb20725dacc1 by Idan Moral in branch 'main':
bpo-43086: Add handling for out-of-spec data in a2b_base64 (GH-24402)
https://github.com/python/cpython/commit/35b98e38b6edd63153fc8e092f94cb20725dacc1
msg397773 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-07-19 00:56
I've merged Idan's PR adding a strict_mode parameter to a2b_base64.  It defaults to False for backwards compatibility.

From a security perspective, it'd be _ideal_ if this were True.  But I expect doing that would break a bunch of existing code and tests that has been relying on some of the former leniency behaviors so I recommended the conservative approach of the old-behavior default.  It'd be a good thing to change it to True, but disruptive.  We need motivating reason to do that.

As it is a new feature due to the new parameter, this is for 3.11.

Workaround for Pythons without this: do a validity check before calling a2b_base64.  I suspect a regex could be constructed for that if you're careful.  If you come up with one, please share it here.
History
Date User Action Args
2022-04-11 14:59:40adminsetgithub: 87252
2021-07-19 18:37:26idan22moralsetpull_requests: + pull_request25796
2021-07-19 00:56:06gregory.p.smithsetstatus: open -> closed
versions: + Python 3.11, - Python 3.10
messages: + msg397773

components: + Extension Modules
resolution: fixed
stage: patch review -> commit review
2021-07-19 00:45:34gregory.p.smithsetmessages: + msg397772
2021-03-13 09:45:31gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2021-01-31 18:29:23eric.smithsetnosy: + eric.smith
2021-01-31 18:18:49python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request23216
stage: patch review
2021-01-31 18:11:52idan22moralcreate