classification
Title: get_addresses results in traceback with an addrspec with an empty local part.
Type: behavior Stage: patch review
Components: email Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, frispete, python-dev, r.david.murray, sjt
Priority: normal Keywords: patch

Created on 2016-06-07 19:24 by frispete, last changed 2020-02-28 17:16 by python-dev.

Files
File name Uploaded Description Edit
lkml-exception.mail frispete, 2016-06-07 19:24
email_flatten.py frispete, 2016-06-16 15:33
Pull Requests
URL Status Linked Edit
PR 18687 closed python-dev, 2020-02-28 17:16
Messages (6)
msg267733 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-06-07 19:24
In the course of replacing an old Python 2.7 email filter tool with a rewritten Python3 version, I stumbled across a ugly case, where such an header:

To: unlisted-recipients: ;,
        ""@pop.kundenserver.de (no To-header on input)

results in a Traceback (most recent call last):
  File "./mail_filter.py", line 606, in <module>
    ret = main.run()
  File "./mail_filter.py", line 595, in run
    self.process(fp)
  File "./mail_filter.py", line 520, in process
    config.recipients = self.get_addresses('to', msg)
  File "./mail_filter.py", line 103, in get_addresses
    vals = msg.get_all(field, [])
  File "/usr/lib64/python3.4/email/message.py", line 511, in get_all
    values.append(self.policy.header_fetch_parse(k, v))
  File "/usr/lib64/python3.4/email/policy.py", line 145, in header_fetch_parse
    return self.header_factory(name, ''.join(value.splitlines()))
  File "/usr/lib64/python3.4/email/headerregistry.py", line 584, in __call__
    return self[name](name, value)
  File "/usr/lib64/python3.4/email/headerregistry.py", line 195, in __new__
    cls.parse(value, kwds)
  File "/usr/lib64/python3.4/email/headerregistry.py", line 342, in parse
    for mb in addr.all_mailboxes]))
  File "/usr/lib64/python3.4/email/headerregistry.py", line 342, in <listcomp>
    for mb in addr.all_mailboxes]))
  File "/usr/lib64/python3.4/email/_header_value_parser.py", line 837, in local_part
    return self[0].local_part
  File "/usr/lib64/python3.4/email/_header_value_parser.py", line 889, in local_part
    return self[0].local_part
  File "/usr/lib64/python3.4/email/_header_value_parser.py", line 984, in local_part
    tok[0].token_type == 'cfws'):
IndexError: list index out of range

I'm not completely sure, if the Top header, as added from my email provider, is perfectly valid, but none of the other parts of my mail infrastructure neither complained, nor behave strange with such headers.

This happens with 3.4.4, but also with the email module from current hg for testing.
msg267844 - (view) Author: Stephen J. Turnbull (sjt) * (Python triager) Date: 2016-06-08 13:13
In Python 3.5, both entering the problematic header by hand with a trivial body and using email.message_from_string to parse it, and calling email.message_from_file on lkml-exception.mail, produce an email.message.Message with no defects and no traceback.

Without access to mail_filter.py, it's not clear what the defect might be.
msg267854 - (view) Author: Stephen J. Turnbull (sjt) * (Python triager) Date: 2016-06-08 14:04
OK, I can reproduce now.

$ python3.5
Python 3.5.0 (v3.5.0:374f501f4567, Sep 17 2015, 17:04:56) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> with open(b'lkml-exception.mail', mode = 'r') as f:
...  msg = email.message_from_file(f, policy=email.policy.SMTP)
... 
>>> msg.get_all('to')
Traceback (most recent call last):

and (except for a slight skew in line-numbering) the rest is the same as the tail of the OP.

The crucial part is the policy=email.policy.SMTP argument, and evidently what's happening is that the parser assumes that the local-part of the addr-spec is non-empty.  RFC5322 does permit a quoted-string to be empty, so this is a bug in the email module's parser.  (I don't have a patch,sorry.)

Aside: although strictly speaking it's hold-your-nose-and-avert-your-eyes legal according to RFC 5322, RFC 5321 (SMTP) does say:

   While the above definition for Local-part is relatively permissive,
   for maximum interoperability, a host that expects to receive mail
   SHOULD avoid defining mailboxes where the Local-part requires (or
   uses) the Quoted-string form[...].

I don't see a good reason for the usage in the test case, so I'd call this nonconformant to RFC 5321.  I think the right way to handle it is to register a defect but let the parse succeed.
msg267886 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-06-08 20:26
Dear Stephen,

thanks for your care. I'm glad, that you're able to reproduce it.

This header is added from the email provider (the biggest here in Germany), so yes, it deserves an entry in the defects list, but must not traceback, of course. It is not expected to provide a sensible way of interoperability otherwise. The unlisted-recipients part is a bit more useful in this respect.
msg267907 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-08 22:01
Yeah, it never occurred to me that the local part could be empty, so I never made a test case for that.  The correct behavior should indeed to be to register a defect and set the local part to blank.  I will not be surprised if there are other bits of the code (on the output side) that expect local part to be non-blank, so there may be some additional test cases and fixes needed.
msg268674 - (view) Author: Hans-Peter Jansen (frispete) * Date: 2016-06-16 15:33
Sorry guys for not providing this earlier.

It turned out, that the sub optimal behaviour is related to a unfortunate policy choice: email.policy.SMTP.
History
Date User Action Args
2020-02-28 17:16:10python-devsetkeywords: + patch
nosy: + python-dev

pull_requests: + pull_request18046
stage: test needed -> patch review
2016-06-16 15:33:06frispetesetfiles: + email_flatten.py

messages: + msg268674
2016-06-08 22:01:49r.david.murraysetmessages: + msg267907
title: get_addresses results in traceback with a valid? header -> get_addresses results in traceback with an addrspec with an empty local part.
2016-06-08 20:26:49frispetesetmessages: + msg267886
2016-06-08 14:04:56sjtsetmessages: + msg267854
2016-06-08 13:13:46sjtsetnosy: + sjt
messages: + msg267844

type: behavior
stage: test needed
2016-06-07 19:24:53frispetecreate