Author jaraco
Date 2014-02-06.14:44:53
As reported in, the email.parser no longer accepts Unicode content as it did in 3.3. I searched the What's New and module documentation, but found no indication that this behavior is no longer supported, so it appears to be a regression. If it's an intentional change, the behavior should be documented in one of the aforementioned documents.

Consider this simple test case:

# -*- coding: utf-8 -*-
import email.parser
meta = """
Header: ☃

Run that on Python 3.3.3 or Python 2 and it executes silently. Run it on Python 3.4.0b3 and it produces this traceback:

Traceback (most recent call last):
  File "C:\Users\jaraco\projects\public\wheel\", line 6, in <module>
  File "C:\Program Files\Python34\lib\email\", line 70, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "C:\Program Files\Python34\lib\email\", line 60, in parse
    return feedparser.close()
  File "C:\Program Files\Python34\lib\email\", line 170, in close
  File "C:\Program Files\Python34\lib\email\", line 163, in _call_parse
  File "C:\Program Files\Python34\lib\email\", line 449, in _parsegen
  File "C:\Program Files\Python34\lib\email\", line 311, in set_payload
    " payload") from None
TypeError: charset argument must be specified when non-ASCII characters are used in the payload
