Issue 9298: binary email attachment issue with base64 encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53544

classification

Title:	binary email attachment issue with base64 encoding
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.1, Python 3.2, Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	r.david.murray	Nosy List:	barry, python-dev, r.david.murray, vunruh, yves@zioup.com
Priority:	normal	Keywords:	patch

Created on 2010-07-19 03:04 by vunruh, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bugs.python.org_issue9298.pdf	vunruh, 2010-07-19 14:57	PDF of this report page
issue9298-2.patch	yves@zioup.com, 2011-02-13 18:36	email/encoders.py and email/test/test_email.py	review

Messages (17)
msg110710 - (view)	Author: Vance Unruh (vunruh)	Date: 2010-07-19 03:04
I'm using Python to email a text version and a PDF version of a report. The standard way of doing things does not work with Vista's Mail program, but works fine with Mail on OSX. So, I don't know if this is a Python or a Vista Mail bug. By standard way, I mean: # Add the attached PDF: part = MIMEApplication(pdf,"pdf") part.add_header('Content-Disposition', 'attachment', filename=pdfFile) msg.attach(part) To fix the problem, I changed C:\Python31\Lib\email\encoders.py to use encodebytes instead of b64encode in order to get mail on Windows Vista to correctly interpret the attachment. This splits the base64 encoding into many lines of some fixed lenth. I can achieve the same thing adding the attachment by hand with the following code: from email.mime.base import MIMEBase part = MIMEBase("application","pdf") part.add_header('Content-Transfer-Encoding', 'base64') part.set_payload(str(base64.encodebytes(pdf),'ascii')) msg.attach(part) Seems like I shouldn't need to know this much. I'm new to Python and this is the first bug I have submitted, so if you need additional information, please let me know.
msg110743 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-07-19 12:50
Can you try this against 3.1 from svn (or py3k from svn)? A bug was fixed that might be relevant. Alternatively a unit test that demonstrates the problem would be most helpful.
msg110762 - (view)	Author: Vance Unruh (vunruh)	Date: 2010-07-19 14:57
Here's code that attaches the pdf file the two different ways. Both attachments are OK when I read the mail on OSX, but one is corrupt when read with Windows Mail on Vista. I wasn't sure what to do with the actual sending of the mail to the server. You'll have to change the code to use your account or something. def emailReport(pdf): """Email the report as multi-part MIME""" from email.mime.multipart import MIMEMultipart msg = MIMEMultipart() msg['Subject'] = 'Corrupt PDF' msg['From'] = 'Me <me@myProvider.net>' msg['To'] = 'You <you@yourProvider.com>' # Add the PDF the easy way that fails: from email.mime.application import MIMEApplication fp = open(pdf, 'rb') part = MIMEApplication(fp.read(),"pdf") fp.close() part.add_header('Content-Disposition', 'attachment',filename='This one fails.pdf') msg.attach(part) # Add the PDF the hard way using the legacy base64 encoder from email.mime.base import MIMEBase part = MIMEBase("application","pdf") part.add_header('Content-Transfer-Encoding', 'base64') part.add_header('Content-Disposition', 'attachment',filename='This one works.pdf') import base64 fp = open(pdf, 'rb') part.set_payload(str(base64.encodebytes(fp.read()),'ascii')) fp.close() msg.attach(part) # Send the email from smtplib import SMTP server = SMTP('smtpauth.provider.net') server.login(user,password) server.sendmail(msg['From'], recipient, msg.as_string()) server.quit() emailReport('bugs.python.org_issue9298.pdf')
msg128220 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-02-09 16:19
See also duplicate issue 11156.
msg128254 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-10 01:29
Solution: In /usr/lib/python3.1/email/encoders.py, use encodebytes instead of b64encode: --- encoders.py 2011-02-08 09:37:21.025030051 -0700 +++ encoders.py.yves 2011-02-08 09:38:04.945608365 -0700 @@ -12,7 +12,7 @@ ] -from base64 import b64encode as _bencode +from base64 import encodebytes as _bencode from quopri import encodestring as _encode
msg128255 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-10 01:30
#!/usr/bin/python3.1 import unittest import email.mime.image class emailEncoderTestCase(unittest.TestCase): def setUp(self): # point to an image binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png' while len(binaryfile) == 0: print('Enter the name of an image: ') binaryfile = raw_input() fp = open('/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png', 'rb') self.bindata = fp.read() def test_convert(self): mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png') base64ed = mimed.get_payload() # print (base64ed) # work out if any line within the string is > 76 chars chopped = base64ed.split('\n') lengths = [ len(x) for x in chopped ] toolong = [ x for x in lengths if x > 76 ] msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.' self.assertEqual(0, len(toolong), msg) if __name__ == '__main__': unittest.main()
msg128256 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-10 01:36
Here's a better version (sorry I don't know how to remove msg128255:
msg128257 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-10 01:37
#!/usr/bin/python3.1 import unittest import email.mime.image class emailEncoderTestCase(unittest.TestCase): def setUp(self): # point to an image binaryfile = '' #binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png' while len(binaryfile) == 0: print('Enter the name of an image: ') binaryfile = input() fp = open(binaryfile, 'rb') self.bindata = fp.read() def test_convert(self): mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png') base64ed = mimed.get_payload() # print (base64ed) # work out if any line within the string is > 76 chars chopped = base64ed.split('\n') lengths = [ len(x) for x in chopped ] toolong = [ x for x in lengths if x > 76 ] msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.' self.assertEqual(0, len(toolong), msg) if __name__ == '__main__': unittest.main()
msg128363 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-11 06:02
Test if email.encoders.encode_base64 returns a single line string, or a string broken up in 76 chars line, as per RFC.
msg128364 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-11 06:04
Replaces b64encode by encodebytes.
msg128404 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-02-11 15:43
Yves: thanks for the patches. If you feel like redoing the test one as a patch against Lib/email/test/test_email.py, that would be great. I'd suggest having the test just split the lines and do assertLessEqual(max([len(x) for x in lines]), 76) or something along those lines. As for the fix, encoders used to use encodebytes, but it tacks on a trailing newline that, according to the old code (see r57800) is unwanted. So perhaps we need to revert that part of r57800 instead.
msg128471 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-12 23:14
I will. Please don't use my patch yet, it breaks something else in the test_email: ./python Lib/test/regrtest.py test_email [1/1] test_email test test_email failed -- Traceback (most recent call last): File "/export/incoming/python/py3k/Lib/email/test/test_email.py", line 1146, in test_body eq(msg.get_payload(), '+vv8/f7/') AssertionError: '+vv8/f7/\n' != '+vv8/f7/' - +vv8/f7/ ? - + +vv8/f7/ 1 test failed: test_email This is with my code patch, not the test patch. I'll look at it, and post again, could be the extra newline you were talking about.
msg128481 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-13 07:17
I've got two issues with this code (Lib/email/test/test_email.py): 1128 def test_body(self): 1129 eq = self.assertEqual 1130 bytes = b'\xfa\xfb\xfc\xfd\xfe\xff' 1131 msg = MIMEApplication(bytes) 1132 eq(msg.get_payload(), '+vv8/f7/') 1133 eq(msg.get_payload(decode=True), bytes) 1) Even though it works, I find the use of a defined type as the name of a variable confusing (line 1130, bytes). 2) The test on line 1132 fails if the base64 payload has an extra newline at the end, but newlines are not an issue in base64 and are actually expected. In fact the test at line 1133 shows that once decoded, the bytes are reverted to their original form. Is there a way to find who is the author of this test, and what was the intent? Would the following test be acceptable (still testing a valid base64 encoding): eq(msg.get_payload().strip(), '+vv8/f7/') Thanks.
msg128515 - (view)	Author: Yves Dorfsman (yves@zioup.com)	Date: 2011-02-13 18:36
encoders.py: Fixes the issue of base64'ed being > 76 chars test_email.py: -test that base64'ed binary is split into 76 chars lines -WARRNING: Changes the test for MIMEApplication.test_body: -it changes the name of the variable 'bytes' to 'bytesdata' -it strip()s the base64ed payload before it compares it to the expected payload. With the change above (using encodebytes instead of b64encode in encoders.py), this test, as is, fails, because there is an extra newline at the end. Extra newlines are part of base64 and should not be an issue, as a matter of fact, you obtain the original bytes when you decode, regardless of having extra newlines. It would be good to know the intent of the original author of this test. Was the intent to ensure there were no newline? If so, why? Or was the intent to simply confirm the base64 encoding conform to the standard? If the latter, my change should not be an issue. All test ("make test") passed with this patch.
msg130655 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-03-12 03:13
Unfortunately we don't have enough history information to determine who wrote the original _bencode function, although very likely it was Barry. As for the test, that seems to have been written during the python3 translation to make sure that the behavior implemented by _bencode was preserved. Python2 has no such test: if you remove the newline check from _bencode, the test suite passes. Checking with RFC 2045, we find this: The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances. "All line breaks..." seems pretty unambiguous: an extra trailing newline should be ignored by any compliant email agent. That does not eliminate the possibility that a non-compliant email agent would tack on an extra newline if there is one after the base64 encoded text, but it seems very very unlikely. I am therefore inclined to fix the test, as you suggest. I hope that Barry can remember why _bencode was introduced in the first place, since clearly there was some reason.
msg131167 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-03-16 20:15
New changeset 062d09d7bf94 by R David Murray in branch '3.1': #9298: restore proper folding of base64 encoded bodies. http://hg.python.org/cpython/rev/062d09d7bf94 New changeset c34320d9095e by R David Murray in branch '3.2': Merge #9298 fix. http://hg.python.org/cpython/rev/c34320d9095e New changeset de2cd04e5101 by R David Murray in branch 'default': Merge #9298 fix. http://hg.python.org/cpython/rev/de2cd04e5101
msg131170 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-03-16 20:21
I talked with Barry, who could find no relevant discussions in his email logs. We decided that _bencode was misguided in the first place (this is reinforced by the bug I fixed a year ago where email was stripping the final newline off a body after decoding it). So I've committed Yves' fix. Thanks, Yves.

History
Date	User	Action	Args
2022-04-11 14:57:03	admin	set	github: 53544
2011-03-16 20:21:28	r.david.murray	set	status: open -> closed messages: + msg131170 resolution: fixed stage: test needed -> resolved
2011-03-16 20:15:21	python-dev	set	nosy: + python-dev messages: + msg131167
2011-03-12 03:13:34	r.david.murray	set	messages: + msg130655
2011-02-13 18:36:13	yves@zioup.com	set	files: + issue9298-2.patch messages: + msg128515
2011-02-13 18:23:34	yves@zioup.com	set	files: - issue9298.patch
2011-02-13 18:23:25	yves@zioup.com	set	files: - issue9298-test.py
2011-02-13 07:17:17	yves@zioup.com	set	messages: + msg128481
2011-02-12 23:14:04	yves@zioup.com	set	messages: + msg128471
2011-02-11 15:43:44	r.david.murray	set	nosy: + barry messages: + msg128404
2011-02-11 06:04:55	yves@zioup.com	set	files: + issue9298.patch messages: + msg128364 keywords: + patch
2011-02-11 06:02:45	yves@zioup.com	set	files: + issue9298-test.py messages: + msg128363
2011-02-10 01:37:00	yves@zioup.com	set	messages: + msg128257
2011-02-10 01:36:29	yves@zioup.com	set	messages: + msg128256
2011-02-10 01:30:04	yves@zioup.com	set	messages: + msg128255
2011-02-10 01:29:35	yves@zioup.com	set	messages: + msg128254
2011-02-09 16:20:38	r.david.murray	link	issue11156 superseder
2011-02-09 16:19:59	r.david.murray	set	versions: + Python 3.3
2011-02-09 16:19:48	r.david.murray	set	nosy: + yves@zioup.com messages: + msg128220
2010-12-27 18:30:50	r.david.murray	set	nosy: r.david.murray, vunruh versions: + Python 3.2
2010-07-19 14:57:01	vunruh	set	files: + bugs.python.org_issue9298.pdf messages: + msg110762
2010-07-19 12:50:17	r.david.murray	set	type: behavior
2010-07-19 12:50:01	r.david.murray	set	nosy: + r.david.murray messages: + msg110743 assignee: r.david.murray stage: test needed
2010-07-19 03:04:55	vunruh	create