This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mlalic
Recipients barry, mlalic, r.david.murray, serhiy.storchaka
Date 2013-06-20.19:15:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1371755723.91.0.402566650656.issue18271@psf.upfronthosting.co.za>
In-reply-to
Content
That will work fine as long as the characters are actually latin. We cannot forget the rest of the unicode character planes. Consider::

>>> message = message_from_string("""MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
... Content-Transfer-Encoding: 8bit
... 
... 한글ᥡ╥ສए""")
>>> message.get_payload(decode=True).decode('latin1')
'\\ud55c\\uae00\\u1961\\u2565\\u0eaa\\u090f'
>>> message.get_payload(decode=True).decode('raw-unicode-escape')
'한글ᥡ╥ສए'

However, even if latin1 did work, the main point is that a different encoding than the one the message specifies must be used in order to decode the bytes to a unicode string.
History
Date User Action Args
2013-06-20 19:15:23mlalicsetrecipients: + mlalic, barry, r.david.murray, serhiy.storchaka
2013-06-20 19:15:23mlalicsetmessageid: <1371755723.91.0.402566650656.issue18271@psf.upfronthosting.co.za>
2013-06-20 19:15:23mlaliclinkissue18271 messages
2013-06-20 19:15:23mlaliccreate