classification
Title: "string".encode('base64') is not the same as base64.b64encode("string")
Type: Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: mahmoudimus, terry.reedy
Priority: normal Keywords:

Created on 2011-01-21 06:25 by mahmoudimus, last changed 2011-01-22 03:39 by mahmoudimus. This issue is now closed.

Messages (3)
msg126696 - (view) Author: Mahmoud Abdelkader (mahmoudimus) Date: 2011-01-21 06:25
Given a string, encoding it with .encode('base64') is not the same as using base64's b64encode function. I think this is very unclear and unintuitive. 

Here's some example code to demonstrate the problem. Before I attempt to submit a patch, is this done for legacy reasons? Are there any reasons to use one over the other?

import hmac
import hashlib
import base64


signature = hmac.new('secret', 'url', hashlib.sha512).digest()
assert signature.encode('base64') == base64.b64encode(signature)
msg126811 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-22 01:45
Questions should generally be asked on python-list or its mirrors.

The docs do not say that the result should be exactly, byte-for-byte, the same. base64 module refers to RFC 3548. Both our doc and the RFC describe variations. The base64 codec does 'Mime base64' (7.8.3. Standard Encodings). The RFC says things like "MIME does not define "base 64" per se, but rather a "base 64 Content-Transfer-Encoding" for use within MIME." It also mentions 'line-break issues'.

You neglected to identify and post what the difference is ;-).

>>> import base64
>>> s='I am a string'
>>> s.encode('base64')
'SSBhbSBhIHN0cmluZw==\n'
>>> base64.b64encode(s)
'SSBhbSBhIHN0cmluZw=='
>>> s.encode('base64')== base64.b64encode(s)+'\n'
True

The addition of '\n' for the Mime version looks to be intentional, and will not be changed for 2.7.

(2.5 and 2.6 only get security patches now.)
msg126814 - (view) Author: Mahmoud Abdelkader (mahmoudimus) Date: 2011-01-22 03:39
Thanks for the clarification Terry. This is indeed not a bug. For reference, the pieces of code I pasted line-wrapped after the 76th character, which was my main source of confusion.

After reading RFC3548, I am now informed that the behavior of string.encode is the correct and expected result, as the documentation per 7.8.3 state that it's MIME 64.
History
Date User Action Args
2011-01-22 03:39:26mahmoudimussetnosy: terry.reedy, mahmoudimus
messages: + msg126814
2011-01-22 01:45:21terry.reedysetstatus: open -> closed
versions: - Python 2.6, Python 2.5
nosy: + terry.reedy

messages: + msg126811

resolution: not a bug
2011-01-21 06:25:11mahmoudimuscreate