This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Base64 decoding gives incorrect outputs.
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ShiftedBit, gdr@garethrees.org, r.david.murray
Priority: normal Keywords:

Created on 2017-06-04 07:59 by ShiftedBit, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
b64_decoding.py ShiftedBit, 2017-06-04 09:52 Implementations of the current decoding functions, which handle the error cases.
Messages (5)
msg295115 - (view) Author: Monty Evans (ShiftedBit) * Date: 2017-06-04 08:04
Tested in Python 2.7 and 3.5 - the base64 module contains a couple of decoding methods, 'standard_b64decode()' and 'b66decode()' which incorrectly decode certain invalid base64 strings. This is outlined in detail here: "https://stackoverflow.com/questions/44347819/python-3-5-base64-decoding-seems-to-be-incorrect". I've checked with a few other developers, and they agree that there is an issue here. I can't see that the issue has been resolved on the bug tracker, so I've worked up an alternative version of "standard_b64decode()" which I'll upload, which ought to solve the issue.
msg295126 - (view) Author: Gareth Rees (gdr@garethrees.org) * (Python triager) Date: 2017-06-04 15:28
RFC 4648 section 3.5 says:

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.

If decoders may choose to reject non-canonical encodings, then they may also
choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think
Python's behaviour is conforming to the standard.
msg295131 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-04 16:42
Not only is it conforming, it is required, since the primary use (originally) of the base64 module was in the email package, where generous interpretation of the input is the standard.

This does not mean that adding a strict mode would be out of line, but that would be an enhancement request, and would require some discussion about the API.

I'm going to close this issue as not a bug.  If you want to submit an enhancement proposal, please do so in a new issue.  You'll probably want to wait for the design discussion to resolve before (re)writing the code :)
msg295138 - (view) Author: Monty Evans (ShiftedBit) * Date: 2017-06-04 18:12
Ah, that is enlightening. It hadn't occured to me that you might want to allow for minor mistakes in the encoder - I must've missed that part of the standard. Thanks to both of you for clearing that up :).
msg295147 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-04 23:04
Actually, the API discussion may be short: we already have a 'validate' option, whose spirit matches with this, so adding the check for the padding issue to that of the non-alphabet characters check would seem to me to be quite reasonable.
History
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74749
2017-06-04 23:04:17r.david.murraysetmessages: + msg295147
2017-06-04 18:12:13ShiftedBitsetmessages: + msg295138
2017-06-04 16:42:17r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg295131

resolution: not a bug
stage: resolved
2017-06-04 15:28:17gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg295126
2017-06-04 14:28:27ShiftedBitsetcomponents: + Library (Lib)
2017-06-04 09:52:28ShiftedBitsetfiles: + b64_decoding.py
2017-06-04 08:04:10ShiftedBitsetmessages: + msg295115
2017-06-04 07:59:06ShiftedBitcreate