Message 191526 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mlalic
Recipients	barry, mlalic, r.david.murray
Date	2013-06-20.15:39:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1371742742.04.0.192610725927.issue18271@psf.upfronthosting.co.za>
In-reply-to

Content
When the message's Content-Transfer-Encoding is set to 8bit, the get_payload(decode=True) method returns the payload encoded using raw-unicode-escape. This means that it is impossible to decode the returned bytes using the content charset obtained by the get_content_charset method. It seems this should be fixed so that get_payload returns the bytes as found in the payload when Content-Transfer-Encoding is 8bit, exactly like Python2.7 handles it. >>> from email import message_from_string >>> message = message_from_string("""MIME-Version: 1.0 ... Content-Type: text/plain; charset=utf-8 ... Content-Disposition: inline ... Content-Transfer-Encoding: 8bit ... ... ünicöde data..""") >>> message.get_content_charset() 'utf-8' >>> message.get_payload(decode=True) b'\xfcnic\xf6de data..' >>> message.get_payload(decode=True).decode(message.get_content_charset()) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 0: invalid start byte >>> message.get_payload(decode=True).decode('raw-unicode-escape') 'ünicöde data..'

When the message's Content-Transfer-Encoding is set to 8bit, the get_payload(decode=True) method returns the payload encoded using raw-unicode-escape. This means that it is impossible to decode the returned bytes using the content charset obtained by the get_content_charset method.

It seems this should be fixed so that get_payload returns the bytes as found in the payload when Content-Transfer-Encoding is 8bit, exactly like Python2.7 handles it.

>>> from email import message_from_string
>>> message = message_from_string("""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
... 
... ünicöde data..""")
>>> message.get_content_charset()
'utf-8'
>>> message.get_payload(decode=True)
b'\xfcnic\xf6de data..'
>>> message.get_payload(decode=True).decode(message.get_content_charset())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 0: invalid start byte
>>> message.get_payload(decode=True).decode('raw-unicode-escape')
'ünicöde data..'

History
Date	User	Action	Args
2013-06-20 15:39:02	mlalic	set	recipients: + mlalic, barry, r.david.murray
2013-06-20 15:39:02	mlalic	set	messageid: <1371742742.04.0.192610725927.issue18271@psf.upfronthosting.co.za>
2013-06-20 15:39:01	mlalic	link	issue18271 messages
2013-06-20 15:39:01	mlalic	create