classification
Title: Option to skip padding for base64 urlsafe encoding/decoding
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Thorney, georg.brandl, lemburg, loewis, nneonneo, r.david.murray
Priority: normal Keywords:

Created on 2017-02-03 01:25 by Thorney, last changed 2017-08-06 23:17 by Thorney.

Messages (3)
msg286837 - (view) Author: Brian Thorne (Thorney) Date: 2017-02-03 01:25
Suggest changing base64 module to better handle encoding schemes that don't use padding. 
Because RFC4648 [1] allows other RFCs that implement RFC4648-compliant base64url encoding to explicitly stipulate that there is no padding. Dropping the padding is lossless when we know the length [2].
Various standard specifications require this - often crypto related (e.g., JWS [3] or named hashes [4]).

RFC4648 specifically makes an exemption for this and it should be better supported in Python's standard library. There is a related closed issue [5] asking for the padding to be removed or altered which wouldn't comply with the spec. This request is different with a view to better support the wider specification.

Proposed behaviour adapted from resolution that ruby discussion on same topic [6]:

- base64.urlsafe_b64encode(s) should continue to produce padded output, but have an additional argument, padding, which defaults to True.
- base64.urlsafe_b64decode(s) should accept both padded and unpadded inputs. It can still reject incorrectly-padded input.


If that sounds sensible I'd like to put a patch/PR together.

From wikipedia [7]:

> Some variants allow or require omitting the padding '=' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode '=' to '.'.

- [1] https://tools.ietf.org/html/rfc4648#page-4
- [2] http://stackoverflow.com/questions/4080988/why-does-base64-encoding-requires-padding-if-the-input-length-is-not-divisible-b
- [3] https://tools.ietf.org/html/rfc7515
- [4] https://tools.ietf.org/html/rfc6920#section-3
- [5] http://bugs.python.org/issue1661108 
- [6] https://bugs.ruby-lang.org/issues/10740
- [7] https://en.wikipedia.org/wiki/Base64#Output_Padding
msg299726 - (view) Author: Robert Xiao (nneonneo) * Date: 2017-08-04 00:45
This sounds reasonable. I ran into a similar issue today trying to decode a JSON Web Key. Although I don't have any real say, I'd say that if you put together a patch it may have a higher chance to get reviewed.

I wonder about the following:

- What about adding a new kwarg to b64decode, passed through by urlsafe_b64decode, called "checkpad=True" which validates padding? Then we can just set that False when we need.
- At the same time it might be nice to pass "validate=False" through from urlsafe_b64decode and friends, so we can have some nicer validation of data.
- I like adding the "padding=True" arg to encode, but it may not be necessary given the ease of ".rstrip('=')" as an alternative. Anyway, if you will add it to encode, please add it to b64encode and pass through from the variant encoders to unify the API somewhat.

If you are still interested in putting together a patch, post a comment. Otherwise I may work on a patch for this.
msg299819 - (view) Author: Brian Thorne (Thorney) Date: 2017-08-06 23:17
Hi Robert, It would be at least a week or two before I could take another look at this so please feel free to work on it. Not sure why I didn't write a patch at the time!
History
Date User Action Args
2017-08-06 23:17:48Thorneysetmessages: + msg299819
2017-08-06 14:07:09r.david.murraysetnosy: + r.david.murray
2017-08-04 00:45:14nneonneosetnosy: + nneonneo

messages: + msg299726
versions: + Python 3.6
2017-02-03 01:25:17Thorneycreate