classification
Title: binary email attachment issue with base64 encoding
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, python-dev, r.david.murray, vunruh, yves@zioup.com
Priority: normal Keywords: patch

Created on 2010-07-19 03:04 by vunruh, last changed 2011-03-16 20:21 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
bugs.python.org_issue9298.pdf vunruh, 2010-07-19 14:57 PDF of this report page
issue9298-2.patch yves@zioup.com, 2011-02-13 18:36 email/encoders.py and email/test/test_email.py review
Messages (17)
msg110710 - (view) Author: Vance Unruh (vunruh) Date: 2010-07-19 03:04
I'm using Python to email a text version and a PDF version of a report. The standard way of doing things does not work with Vista's Mail program, but works fine with Mail on OSX. So, I don't know if this is a Python or a Vista Mail bug. By standard way, I mean:

    # Add the attached PDF:
    part = MIMEApplication(pdf,"pdf")
    part.add_header('Content-Disposition', 'attachment', filename=pdfFile)
    msg.attach(part)


To fix the problem, I changed C:\Python31\Lib\email\encoders.py to use encodebytes instead of b64encode in order to get mail on Windows Vista to correctly interpret the attachment. This splits the base64 encoding into many lines of some fixed lenth.

I can achieve the same thing adding the attachment by hand with the following code:

    from email.mime.base import MIMEBase
    part = MIMEBase("application","pdf")
    part.add_header('Content-Transfer-Encoding', 'base64') 
    part.set_payload(str(base64.encodebytes(pdf),'ascii'))
    msg.attach(part)

Seems like I shouldn't need to know this much.

I'm new to Python and this is the first bug I have submitted, so if you need additional information, please let me know.
msg110743 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-19 12:50
Can you try this against 3.1 from svn (or py3k from svn)?  A bug was fixed that might be relevant.  Alternatively a unit test that demonstrates the problem would be most helpful.
msg110762 - (view) Author: Vance Unruh (vunruh) Date: 2010-07-19 14:57
Here's code that attaches the pdf file the two different ways. Both attachments are OK when I read the mail on OSX, but one is corrupt when read with Windows Mail on Vista. I wasn't sure what to do with the actual sending of the mail to the server. You'll have to change the code to use your account or something.

def emailReport(pdf):
    """Email the report as multi-part MIME"""

    from email.mime.multipart import MIMEMultipart  
    msg = MIMEMultipart()
    msg['Subject'] = 'Corrupt PDF'
    msg['From'] = 'Me <me@myProvider.net>'
    msg['To'] = 'You <you@yourProvider.com>'

    # Add the PDF the easy way that fails:
    from email.mime.application import MIMEApplication
    fp = open(pdf, 'rb')
    part = MIMEApplication(fp.read(),"pdf")
    fp.close()
    part.add_header('Content-Disposition', 'attachment',filename='This one fails.pdf')
    msg.attach(part)

    # Add the PDF the hard way using the legacy base64 encoder
    from email.mime.base import MIMEBase
    part = MIMEBase("application","pdf")
    part.add_header('Content-Transfer-Encoding', 'base64')
    part.add_header('Content-Disposition', 'attachment',filename='This one works.pdf')
    import base64
    fp = open(pdf, 'rb')
    part.set_payload(str(base64.encodebytes(fp.read()),'ascii'))
    fp.close()
    msg.attach(part)

    # Send the email
    from smtplib import SMTP
    server = SMTP('smtpauth.provider.net')
    server.login(user,password)
    server.sendmail(msg['From'], recipient, msg.as_string())
    server.quit()



emailReport('bugs.python.org_issue9298.pdf')
msg128220 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-09 16:19
See also duplicate issue 11156.
msg128254 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-10 01:29
Solution:
In /usr/lib/python3.1/email/encoders.py, use encodebytes instead of b64encode:

--- encoders.py 2011-02-08 09:37:21.025030051 -0700
+++ encoders.py.yves    2011-02-08 09:38:04.945608365 -0700
@@ -12,7 +12,7 @@
     ]
 
 
-from base64 import b64encode as _bencode
+from base64 import encodebytes as _bencode
 from quopri import encodestring as _encode
msg128255 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-10 01:30
#!/usr/bin/python3.1

import unittest
import email.mime.image


class emailEncoderTestCase(unittest.TestCase):
  def setUp(self):

    # point to an image
    binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'

    while len(binaryfile) == 0:
      print('Enter the name of an image: ')
      binaryfile = raw_input()

    fp = open('/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png', 'rb')
    self.bindata = fp.read()


  def test_convert(self):

    mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')

    base64ed = mimed.get_payload()
    # print (base64ed)

    # work out if any line within the string is > 76 chars
    chopped = base64ed.split('\n')
    lengths = [ len(x) for x in chopped ]
    toolong = [ x for x in lengths if x > 76 ]

    msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
    self.assertEqual(0, len(toolong), msg)



if __name__ == '__main__':
  unittest.main()
msg128256 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-10 01:36
Here's a better version (sorry I don't know how to remove msg128255:
msg128257 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-10 01:37
#!/usr/bin/python3.1

import unittest
import email.mime.image


class emailEncoderTestCase(unittest.TestCase):
  def setUp(self):

    # point to an image
    binaryfile = ''
    #binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'

    while len(binaryfile) == 0:
      print('Enter the name of an image: ')
      binaryfile = input()

    fp = open(binaryfile, 'rb')
    self.bindata = fp.read()


  def test_convert(self):

    mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')

    base64ed = mimed.get_payload()
    # print (base64ed)

    # work out if any line within the string is > 76 chars
    chopped = base64ed.split('\n')
    lengths = [ len(x) for x in chopped ]
    toolong = [ x for x in lengths if x > 76 ]

    msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
    self.assertEqual(0, len(toolong), msg)



if __name__ == '__main__':
  unittest.main()
msg128363 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-11 06:02
Test if email.encoders.encode_base64 returns a single line string, or a string broken up in 76 chars line, as per RFC.
msg128364 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-11 06:04
Replaces b64encode by encodebytes.
msg128404 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-11 15:43
Yves: thanks for the patches.  If you feel like redoing the test one as a patch against Lib/email/test/test_email.py, that would be great.  I'd suggest having the test just split the lines and do

  assertLessEqual(max([len(x) for x in lines]), 76)

or something along those lines.

As for the fix, encoders used to use encodebytes, but it tacks on a trailing newline that, according to the old code (see r57800) is unwanted.  So perhaps we need to revert that part of r57800 instead.
msg128471 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-12 23:14
I will. Please don't use my patch yet, it breaks something else in the test_email:

./python Lib/test/regrtest.py test_email
[1/1] test_email
test test_email failed -- Traceback (most recent call last):
  File "/export/incoming/python/py3k/Lib/email/test/test_email.py", line 1146, in test_body
    eq(msg.get_payload(), '+vv8/f7/')
AssertionError: '+vv8/f7/\n' != '+vv8/f7/'
- +vv8/f7/
?         -
+ +vv8/f7/

1 test failed:
    test_email


This is with my code patch, not the test patch. I'll look at it, and post again, could be the extra newline you were talking about.
msg128481 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-13 07:17
I've got two issues with this code (Lib/email/test/test_email.py):


1128     def test_body(self):
1129         eq = self.assertEqual
1130         bytes = b'\xfa\xfb\xfc\xfd\xfe\xff'
1131         msg = MIMEApplication(bytes)
1132         eq(msg.get_payload(), '+vv8/f7/')
1133         eq(msg.get_payload(decode=True), bytes)

1) Even though it works, I find the use of a defined type as the name of a variable confusing (line 1130, bytes).

2) The test on line 1132 fails if the base64 payload has an extra newline at the end, but newlines are not an issue in base64 and are actually expected. In fact the test at line 1133 shows that once decoded, the bytes are reverted to their original form.

Is there a way to find who is the author of this test, and what was the intent? Would the following test be acceptable (still testing a valid base64 encoding):

eq(msg.get_payload().strip(), '+vv8/f7/')


Thanks.
msg128515 - (view) Author: Yves Dorfsman (yves@zioup.com) Date: 2011-02-13 18:36
encoders.py:
Fixes the issue of base64'ed being > 76 chars


test_email.py:

-test that base64'ed binary is split into 76 chars lines

-WARRNING: Changes the test for MIMEApplication.test_body:
    -it changes the name of the variable 'bytes' to 'bytesdata'
    -it strip()s the base64ed payload before it compares it to the expected payload. With the change above (using encodebytes instead of b64encode in encoders.py), this test, as is, fails, because there is an extra newline at the end. Extra newlines are part of base64 and should not be an issue, as a matter of fact, you obtain the original bytes when you decode, regardless of having extra newlines. It would be good to know the intent of the original author of this test. Was the intent to ensure there were no newline? If so, why? Or was the intent to simply confirm the base64 encoding conform to the standard? If the latter, my change should not be an issue.

All test ("make test") passed with this patch.
msg130655 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-12 03:13
Unfortunately we don't have enough history information to determine who wrote the original _bencode function, although very likely it was Barry.  As for the test, that seems to have been written during the python3 translation to make sure that the behavior implemented by _bencode was preserved.  Python2 has no such test: if you remove the newline check from _bencode, the test suite passes.

Checking with RFC 2045, we find this:

   The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 must be ignored by decoding software.  In base64
   data, characters other than those in Table 1, line breaks, and other
   white space probably indicate a transmission error, about which a
   warning message or even a message rejection might be appropriate
   under some circumstances.

"All line breaks..." seems pretty unambiguous: an extra trailing newline should be ignored by any compliant email agent.  That does not eliminate the possibility that a non-compliant email agent would tack on an extra newline if there is one after the base64 encoded text, but it seems very very unlikely.  I am therefore inclined to fix the test, as you suggest.

I hope that Barry can remember why _bencode was introduced in the first place, since clearly there was *some* reason.
msg131167 - (view) Author: Roundup Robot (python-dev) Date: 2011-03-16 20:15
New changeset 062d09d7bf94 by R David Murray in branch '3.1':
#9298: restore proper folding of base64 encoded bodies.
http://hg.python.org/cpython/rev/062d09d7bf94

New changeset c34320d9095e by R David Murray in branch '3.2':
Merge #9298 fix.
http://hg.python.org/cpython/rev/c34320d9095e

New changeset de2cd04e5101 by R David Murray in branch 'default':
Merge #9298 fix.
http://hg.python.org/cpython/rev/de2cd04e5101
msg131170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-16 20:21
I talked with Barry, who could find no relevant discussions in his email logs.  We decided that _bencode was misguided in the first place (this is reinforced by the bug I fixed a year ago where email was stripping the final newline off a body *after* decoding it).  So I've committed Yves' fix.  Thanks, Yves.
History
Date User Action Args
2011-03-16 20:21:28r.david.murraysetstatus: open -> closed

messages: + msg131170
resolution: fixed
stage: test needed -> resolved
2011-03-16 20:15:21python-devsetnosy: + python-dev
messages: + msg131167
2011-03-12 03:13:34r.david.murraysetmessages: + msg130655
2011-02-13 18:36:13yves@zioup.comsetfiles: + issue9298-2.patch

messages: + msg128515
2011-02-13 18:23:34yves@zioup.comsetfiles: - issue9298.patch
2011-02-13 18:23:25yves@zioup.comsetfiles: - issue9298-test.py
2011-02-13 07:17:17yves@zioup.comsetmessages: + msg128481
2011-02-12 23:14:04yves@zioup.comsetmessages: + msg128471
2011-02-11 15:43:44r.david.murraysetnosy: + barry
messages: + msg128404
2011-02-11 06:04:55yves@zioup.comsetfiles: + issue9298.patch

messages: + msg128364
keywords: + patch
2011-02-11 06:02:45yves@zioup.comsetfiles: + issue9298-test.py

messages: + msg128363
2011-02-10 01:37:00yves@zioup.comsetmessages: + msg128257
2011-02-10 01:36:29yves@zioup.comsetmessages: + msg128256
2011-02-10 01:30:04yves@zioup.comsetmessages: + msg128255
2011-02-10 01:29:35yves@zioup.comsetmessages: + msg128254
2011-02-09 16:20:38r.david.murraylinkissue11156 superseder
2011-02-09 16:19:59r.david.murraysetversions: + Python 3.3
2011-02-09 16:19:48r.david.murraysetnosy: + yves@zioup.com
messages: + msg128220
2010-12-27 18:30:50r.david.murraysetnosy: r.david.murray, vunruh
versions: + Python 3.2
2010-07-19 14:57:01vunruhsetfiles: + bugs.python.org_issue9298.pdf

messages: + msg110762
2010-07-19 12:50:17r.david.murraysettype: behavior
2010-07-19 12:50:01r.david.murraysetnosy: + r.david.murray
messages: + msg110743

assignee: r.david.murray
stage: test needed
2010-07-19 03:04:55vunruhcreate