classification
Title: Exception when parsing an email using email.parser.BytesParser
Type: behavior Stage: resolved
Components: email Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Elmer, barry, python-dev, r.david.murray
Priority: normal Keywords:

Created on 2015-03-23 07:43 by Elmer, last changed 2015-03-30 01:57 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
testmail.eml Elmer, 2015-03-23 07:43 email source that triggers exception
Messages (3)
msg238987 - (view) Author: Elmer (Elmer) Date: 2015-03-23 07:43
I am working with a large dataset of emails and loading one of them resulted in an exception: "TypeError: unorderable types: ValueTerminal() < CFWSList()"

I have attached the (anonymised and minimised) email source of the email that triggered the exception.

$ python
Python 3.4.2 (default, Nov 12 2014, 18:23:59) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.54)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> from email import parser, policy
>>> 
>>> f = open("testmail.eml",'rb')
>>> src = f.read()
>>> f.close()
>>> 
>>> msg = email.parser.BytesParser(_class=email.message.EmailMessage, policy=email.policy.default).parsebytes(src)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/parser.py", line 124, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/parser.py", line 68, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/parser.py", line 57, in parse
    feedparser.feed(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/feedparser.py", line 178, in feed
    self._call_parse()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/feedparser.py", line 182, in _call_parse
    self._parse()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/feedparser.py", line 384, in _parsegen
    for retval in self._parsegen():
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/feedparser.py", line 255, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/message.py", line 579, in get_content_type
    value = self.get('content-type', missing)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/message.py", line 472, in get
    return self.policy.header_fetch_parse(k, v)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/policy.py", line 145, in header_fetch_parse
    return self.header_factory(name, ''.join(value.splitlines()))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/headerregistry.py", line 583, in __call__
    return self[name](name, value)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/headerregistry.py", line 194, in __new__
    cls.parse(value, kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/headerregistry.py", line 441, in parse
    kwds['decoded'] = str(parse_tree)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/_header_value_parser.py", line 195, in __str__
    return ''.join(str(x) for x in self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/_header_value_parser.py", line 195, in <genexpr>
    return ''.join(str(x) for x in self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/_header_value_parser.py", line 1136, in __str__
    for name, value in self.params:
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/email/_header_value_parser.py", line 1101, in params
    parts = sorted(parts)
TypeError: unorderable types: ValueTerminal() < CFWSList()
msg239555 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-30 01:54
New changeset dc10c52c6539 by R David Murray in branch '3.4':
#23745: handle duplicate MIME parameter names in new parser.
https://hg.python.org/cpython/rev/dc10c52c6539

New changeset fe9a578d5f38 by R David Murray in branch 'default':
Merge: #23745: handle duplicate MIME parameter names in new parser.
https://hg.python.org/cpython/rev/fe9a578d5f38
msg239557 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-30 01:57
The issue arose from the duplicated parameter name.  I fixed it by (mostly) copying the error recovery used by the older api (get_param).

Note that you don't need to specify both policy and _class.  If you use the new policies (such as default), it automatically uses EmailMessage for the _class.
History
Date User Action Args
2015-03-30 01:57:00r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg239557

stage: resolved
2015-03-30 01:54:50python-devsetnosy: + python-dev
messages: + msg239555
2015-03-23 07:43:43Elmercreate