Issue 7143: get_payload(decode=True) eats last newline in base64 encoded payload

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51392

classification

Title:	get_payload(decode=True) eats last newline in base64 encoded payload
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	r.david.murray	Nosy List:	Joaquin.Cuenca.Abela, athomas, barry, esam, r.david.murray
Priority:	normal	Keywords:	easy, patch

Created on 2009-10-15 18:06 by athomas, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test.eml	athomas, 2009-10-15 18:06	Sample email
b64crlf.patch	Joaquin.Cuenca.Abela, 2010-03-04 16:03
b64crlf-2.patch	Joaquin.Cuenca.Abela, 2010-03-04 22:22

Messages (12)
msg94100 - (view)	Author: Andreas Thomas (athomas)	Date: 2009-10-15 18:06
The attachment of my sample email contains '123\n', but get_payload (decode=True) returns '123'. The source of this error is the function _bdecode from the email.utils module, which gets called for decoding base64 encoded payloads. Here the last \n gets stripped away, because (comment in the function): # We can't quite use base64.encodestring() since it tacks on a "courtesy # newline". Blech! I don't think that's true. At least not for Python 2.5.4 and upwards. Also note the "encodestring" instead of "decodestring". In fact calling base64.decodestring(part.get_payload()) returns the correct value, so I propose to remove this function completely and just use the standard decoding method. Please note that base64.encodestring really tacks on a \n.
msg97570 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-01-11 01:47
The behavior you object to is in fact correct per RFC 2046, 5.5.1: The boundary delimiter MUST occur at the beginning of a line, i.e., following a CRLF, and the initial CRLF is considered to be attached to the boundary delimiter line rather than part of the preceding part. So in your example message your attatched file does not in fact have a final newline. Note that you do not get a newline regardless of whether decode is true ('p' is the part extracted from your sample message): >>> p.get_payload() 'MTIzCg==' >>> p.get_payload(decode=True) '123'
msg100338 - (view)	Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela)	Date: 2010-03-03 11:22
Hi, RFC 2046, 5.1.1 refers to the CRLF that happens just before the boundary. It says nothing about an encoded CRLF. From Andreas example, if you have: Content-Type: text/plain; name="test.txt" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="test.txt" MTIzCg== --------------040103020004000509010404-- You are correctly eating the CRLF that exists between MTIzCg== and --------------040103020004000509010404--, because it's part of the boundary. You are also eating the CRLF that is inside the base64 encoded text, and I agree with Andreas that this is incorrect. Will you please consider reopening this bug?
msg100343 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-03 14:08
Ah, I misunderstood, and did not see that the newline in question was inside the base64 string. Thank you for pointing out my mistake. Would either of you like to propose a patch, including a test case? (I've removed 2.5 because it is in security-fix-only mode).
msg100394 - (view)	Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela)	Date: 2010-03-04 16:03
Hi, I've never before made a patch to Python, so take it with care. A couple of comments, I reused a test where all the attachments contained an ending newline, except for the base64 one (conveniently...) I think the comment in _bdecode that Andreas quoted before refers to base64.encodestring add an extra \n to de encoded string, but base64.decodestring can work with and without this extra \n: >>> import base64 >>> base64.decodestring('MTIzCg==') '123\n' >>> base64.decodestring('MTIzCg==\n') '123\n' So it's not necessary to work around this in _bdecode. Let me know if it looks good to you.
msg100403 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-04 18:01
Thanks for working on this. For the most part your patch looks fine. Two comments: (1) it concerns me that by co-opting the existing test, we are no longer testing that decoding does not introduce a spurious newline :). (2) I think we should add a comment in _bdecode that it used to do more and is retained now only for backward compatibility. I don't particularly like proliferating test files, and I plan to fix that in email6, but for now I think you should just add a new test file based on msg_10, and a test that uses it.
msg100427 - (view)	Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela)	Date: 2010-03-04 22:22
I added a new subpart to msg_10.txt, that keeps the previous test and also tests the new behavior. Let me know if it's ok like this or if you still prefer to create a different msg file for testing this. Thanks,
msg100592 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-07 16:58
Your patch looks good, thank you. I just realized that Barry isn't nosy on this issue. I've checked, and the code in question dates back to email version 1.0...code of that long standing that exists specifically to implement the behavior we propose to change makes me wonder what we will break if we fix it. Yet the current behavior seems clearly contrary to the RFCs to me. Barry? Any memory of why _bdecode exists and what will break if we fix it? In any case I suspect that backporting this fix might be a bad idea, since existing code may be compensating for the current behavior and break if the behavior is fixed.
msg100596 - (view)	Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela)	Date: 2010-03-07 18:04
Unfortunately the only way that I can see to reliably work around this is to bypass entirely get_payload, in this case fixing this bug will not affect people that do that negatively. Some people may have more control over their attachments and they are unconditionally adding a \n to certain type of attachments. In this case they will get an extra \n after this patch. If they are doing this, they certainly realized there was a bug, because this is only necessary for base64 encoded attachments and not for any other encoding. In this case I guess it's reasonable to expect that this bug will be fixed one day. But you're right that the change is risky, an that we (at least I) are not sure why the code is the way it is. Cheers,
msg100614 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2010-03-07 21:46
Thanks for adding me to the nosy list. Yep, this code is pretty old so it doesn't surprise me that its implementation isn't quite right. Of course, I hate get_payload(decode=True) anyway and hope that goes away in email 6. Having said that, I don't think this can be changed in Python 2.6. It's pretty common for people to have workarounds for the devil they know and I'd bet that changing this behavior in 2.6.x would break people's code. I'm okay for doing the sensible thing in Python 2.7 (though we should probably bump the email package's micro version).
msg100626 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-08 02:23
OK, patch (with comment tweaks to refer to this issue), and email minor version bump, applied to trunk in r78778. It turns out that bdecode was already deleted in email 5 in py3k. I did port the test in r78780. Thanks Joaquin Cuenca Abela, and you now have the interesting distinction of being the first name listed in Misc/ACKS (because we've been maintaining it in sorted order by last name....)
msg100631 - (view)	Author: Joaquin Cuenca Abela (Joaquin.Cuenca.Abela)	Date: 2010-03-08 08:14
Thanks David and Barry! As much as it flatters my ego to be in the first place is MISC/ACK, I have to confess that my family name is "Cuenca Abela", Cuenca is not a middle name. Cheers,

History
Date	User	Action	Args
2022-04-11 14:56:54	admin	set	github: 51392
2010-03-08 08:14:47	Joaquin.Cuenca.Abela	set	messages: + msg100631
2010-03-08 02:23:34	r.david.murray	set	status: open -> closed versions: - Python 3.2 messages: + msg100626 resolution: fixed stage: patch review -> resolved
2010-03-07 21:46:08	barry	set	messages: + msg100614
2010-03-07 18:04:02	Joaquin.Cuenca.Abela	set	messages: + msg100596
2010-03-07 16:58:17	r.david.murray	set	nosy: + barry messages: + msg100592 versions: - Python 2.6, Python 2.5, Python 3.1
2010-03-04 22:22:46	Joaquin.Cuenca.Abela	set	files: + b64crlf-2.patch messages: + msg100427 versions: + Python 2.5
2010-03-04 18:01:59	r.david.murray	set	messages: + msg100403 stage: test needed -> patch review
2010-03-04 16:03:45	Joaquin.Cuenca.Abela	set	files: + b64crlf.patch keywords: + patch messages: + msg100394
2010-03-03 14:12:06	r.david.murray	set	versions: - Python 3.0
2010-03-03 14:08:55	r.david.murray	set	status: closed -> open assignee: r.david.murray title: get_payload(decode=True) eats last newline -> get_payload(decode=True) eats last newline in base64 encoded payload keywords: + easy resolution: not a bug -> (no value) versions: - Python 2.5 messages: + msg100343 stage: resolved -> test needed
2010-03-03 11:22:56	Joaquin.Cuenca.Abela	set	nosy: + Joaquin.Cuenca.Abela messages: + msg100338
2010-01-11 01:47:21	r.david.murray	set	status: open -> closed priority: normal nosy: + r.david.murray messages: + msg97570 resolution: not a bug stage: resolved
2009-10-16 06:47:02	esam	set	nosy: + esam
2009-10-15 18:06:47	athomas	create