This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.message_from_string no longer working in Python 3.4
Type: behavior Stage: resolved
Components: email Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder: TypeError in e-mail.parser when non-ASCII is present
View: 20531
Assigned To: Nosy List: apollo13, barry, r.david.murray
Priority: normal Keywords:

Created on 2013-12-28 14:09 by apollo13, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg207028 - (view) Author: Florian Apolloner (apollo13) Date: 2013-12-28 14:09
Given this email:
---
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: =?utf-8?q?Ch=C3=A8re_maman?=
From: from@example.com
To: to@example.com
Date: Sat, 28 Dec 2013 13:08:07 -0000
Message-ID: <20131228130807.3669.79195@localhost>

Je t'aime très fort
---

I get this traceback:
---
  File "/home/florian/sources/cpython/Lib/email/__init__.py", line 40, in message_from_string
    return Parser(*args, **kws).parsestr(s)
  File "/home/florian/sources/cpython/Lib/email/parser.py", line 70, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/home/florian/sources/cpython/Lib/email/parser.py", line 60, in parse
    return feedparser.close()
  File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 170, in close
    self._call_parse()
  File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 163, in _call_parse
    self._parse()
  File "/home/florian/sources/cpython/Lib/email/feedparser.py", line 449, in _parsegen
    self._cur.set_payload(EMPTYSTRING.join(lines))
  File "/home/florian/sources/cpython/Lib/email/message.py", line 311, in set_payload
    " payload") from None
TypeError: charset argument must be specified when non-ASCII characters are used in the payload
---

This is new in 3.4 since that's the first version which requires set_payload to provide a charset argument, imo message_from_string should figure that out from the message.
msg207031 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-28 15:43
Hmm.  There's definitely a backward compatibility issue here of some sort, but how are you parsing this email?  And does it work or fail in some other way on 3.3 tip?
msg207032 - (view) Author: Florian Apolloner (apollo13) Date: 2013-12-28 15:44
Yes, it works on python3.3 (from debian); I am parsing directly via email.message_from_string:
Python 3.3.3 (default, Dec  8 2013, 14:51:59) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> msg="""Content-Type: text/plain; charset="utf-8"
... MIME-Version: 1.0
... Content-Transfer-Encoding: 8bit
... Subject: =?utf-8?q?Ch=C3=A8re_maman?=
... From: from@example.com
... To: to@example.com
... Date: Sat, 28 Dec 2013 13:08:07 -0000
... Message-ID: <20131228130807.3669.79195@localhost>
... 
... Je t'aime très fort"""
>>> email.message_from_string(msg)
<email.message.Message object at 0x7fcfbcbf9090>
msg207033 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-28 15:47
Nevermind, I failed to notice the message_from_string part of the traceback.

Different question: what are you doing with the message after you parse it?  It is not an RFC valid message if you parse it from a string, so the only way to make it produce an RFC valid output is if you emit it as a string *and* encode the output to utf-8.

I'll have to think about how this "should" work...a clearer error message may be the answer, but if so I suppose I'll need an actual deprecation period before shipping the charset fix for set_payload.
msg210514 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-07 18:37
This check has been reverted in issue 20531.
History
Date User Action Args
2022-04-11 14:57:56adminsetgithub: 64288
2014-02-07 18:37:51r.david.murraysetstatus: open -> closed
superseder: TypeError in e-mail.parser when non-ASCII is present
messages: + msg210514

type: behavior
resolution: fixed
stage: resolved
2013-12-28 15:47:35r.david.murraysetmessages: + msg207033
2013-12-28 15:44:40apollo13setmessages: + msg207032
2013-12-28 15:43:15r.david.murraysetmessages: + msg207031
2013-12-28 14:20:57apollo13setnosy: + barry, r.david.murray

components: + email
versions: + Python 3.4
2013-12-28 14:09:03apollo13create