New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.generator.BytesGenerator fails with bytes payload #60768
Comments
I'm trying to use the email.* functions to craft HTTP POST data for file upload. Trying something like filedata = open("data", "rb").read()
postdata = MIMEMultipart()
fileattachment = MIMEApplication(filedata, _encoder=email.encoders.encode_noop)
postdata.attach(fileattachment)
fp = BytesIO()
g = BytesGenerator(fp)
g.flatten(postdata, unixfrom=False) fails with Traceback (most recent call last):
File "./minetest.py", line 30, in <module>
g.flatten(postdata, unixfrom=False)
File "/usr/lib/python3.2/email/generator.py", line 91, in flatten
self._write(msg)
File "/usr/lib/python3.2/email/generator.py", line 137, in _write
self._dispatch(msg)
File "/usr/lib/python3.2/email/generator.py", line 163, in _dispatch
meth(msg)
File "/usr/lib/python3.2/email/generator.py", line 224, in _handle_multipart
g.flatten(part, unixfrom=False, linesep=self._NL)
File "/usr/lib/python3.2/email/generator.py", line 91, in flatten
self._write(msg)
File "/usr/lib/python3.2/email/generator.py", line 137, in _write
self._dispatch(msg)
File "/usr/lib/python3.2/email/generator.py", line 163, in _dispatch
meth(msg)
File "/usr/lib/python3.2/email/generator.py", line 192, in _handle_text
raise TypeError('string payload expected: %s' % type(payload))
TypeError: string payload expected: <class 'bytes'> This is because BytesGenerator._handle_text() expects str payload in which byte values that are non-printable in ASCII have been replaced by surrogates. The example above creates a bytes payload, however, for which super(BytesGenerator,self)._handle_text(msg) = Generator._handle_text(msg) throws the exception. Note that using any email.encoders other than encode_noop does not really fit the HTTP POST bill, as those define a Content-Transfer-Encoding which HTTP does not know. It would seem better to me to let BytesGenerator accept a bytes payload and just copy that to the output, rather than making the application encode the bytes as a string, hopefully in a way that s.encode('ascii', 'surrogateescape') can invert. E.g., a workaround class I use now does class FixedBytesGenerator(BytesGenerator):
def _handle_bytes(self, msg):
payload = msg.get_payload()
if payload is None:
return
if isinstance(payload, bytes):
self._fp.write(payload)
elif isinstance(payload, str):
super(FixedBytesGenerator,self)._handle_text(msg)
else:
# Payload is neither bytes not string - this can't be right
raise TypeError('bytes or str payload expected: %s' % type(payload))
_writeBody = _handle_bytes |
Yes, the way BytesGenerator works is basically a hack to get the email package itself working. Use cases outside the email package were not really considered in the (short) timeframe during which it was implemented. The longer term plan calls for redoing the way payloads are handled to generalize the whole process. I'd like to see this happen for 3.4, but I'm not sure I'm going to have the time to finish the work (I'm hopeful that I will, though). In the meantime, while your suggestion is a good one, I'm ambivalent about applying it as a bug fix. It is on the border between a fix and a feature, since the email package in 3.x hasn't ever supported bytes payloads, only encoded payloads. |
Hmm. Let me rephrase that. *Internally* it doesn't support bytes payloads, it "encodes" bytes payloads as surrogateescaped ascii, as you have oserved. Which is why this is on the borderline, and could possibly be considered a bug fix, because from an external point of view it does support parsing and generating 8bit payloads. I need to give it some thought, and perhaps others will weigh in with opinions. |
Looking at the documentation, it is clear that (a) what you are trying to do is documented as being correct and (b) it worked in Python2, making this a regression. I've attached a patch to fix this, which also probably fixes some bugs with BytesGenerator handing of non-text CTE 8bit parts created by BytesParser, but I haven't added tests to confirm that. |
Updated patch after review by Ezio and Serhiy. |
>>> import io, email
>>> bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
>>> msg = email.mime.application.MIMEApplication(bytesdata, _encoder=encoders.encode_7or8bit)
>>> s = io.BytesIO()
>>> g = email.generator.BytesGenerator(s)
>>> g.flatten(msg)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/serhiy/py/cpython3.2/Lib/email/generator.py", line 91, in flatten
self._write(msg)
File "/home/serhiy/py/cpython3.2/Lib/email/generator.py", line 137, in _write
self._dispatch(msg)
File "/home/serhiy/py/cpython3.2/Lib/email/generator.py", line 163, in _dispatch
meth(msg)
File "/home/serhiy/py/cpython3.2/Lib/email/generator.py", line 393, in _handle_text
if _has_surrogates(msg._payload):
TypeError: can't use a string pattern on a bytes-like object |
While related, that is a different bug, so I'd rather open a new issue for it. |
New changeset 30f92600df9d by R David Murray in branch '2.7': New changeset a1a04f76d08c by R David Murray in branch '3.2': New changeset 2b1edefc1e99 by R David Murray in branch '3.3': New changeset 5a0478bd5f11 by R David Murray in branch 'default': |
I've opened bpo-17171 for the similar encode7or8bit problem. |
New changeset 64e004737837 by R David Murray in branch '3.3': |
It seems to me that this issue is not fixed correctly yet. I've tried Python 3.3.2: When modifying the test case in Lib/test/test_email/test_email.py like this: --- Lib/test/test_email/test_email.py 2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py 2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
# Issue 16564: This does not produce an RFC valid message, since to be
# valid it should have a CTE of binary. But the below works in
# Python2, and is documented as working this way.
- bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+ bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
# Treated as a string, this will be invalid code points.
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg.get_payload(decode=True), bytesdata)
s = BytesIO()
g = BytesGenerator(s)
g.flatten(msg)
wireform = s.getvalue()
msg2 = email.message_from_bytes(wireform)
- self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+ # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
self.assertEqual(msg2.get_payload(decode=True), bytesdata) then running: ./python ./Tools/scripts/run_tests.py test_email results in: ====================================================================== Traceback (most recent call last):
File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted. Encoding the bytes array: results output data (MIME Header stripped): 0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................ That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', |
That's a different bug, and is probably due to the fact that \x0b is considered a line-ending character by the 'splitlines' method. Could you please open a new issue for this? It could be that this can't be fixed in Python3 until support for the 'binary' CTE is added. |
Opened bpo-19003. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: