This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Exception parsing certain invalid email address headers
Type: behavior Stage: resolved
Components: email Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: barry, iritkatriel, ncoghlan, r.david.murray, timb07, xtreak
Priority: normal Keywords:

Created on 2017-06-19 09:25 by timb07, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg296309 - (view) Author: Tim Bell (timb07) * Date: 2017-06-19 09:25
According to RFC 5322, an email address like this isn't valid:

user@example.com <user@example.com>

(The display-name "user@example.com" contains "@", which isn't in the set of atext characters used to form an atom.)

How it's handled by the email package varies by policy:

>>> import email
>>> from email.policy import default
>>> email.message_from_bytes(b'To: user@example.com <user@example.com>')['to']
'user@example.com <user@example.com>'
>>> email.message_from_bytes(b'To: user@example.com <user@example.com>', policy=default)['to']
'user@example.com'
>>> email.message_from_bytes(b'To: user@example.com <user@example.com>', policy=default).defects
[]

The difference between the behaviour under the compat32 vs "default" policy may or may not be significant.

However, if coupled with a further invalid feature, namely a space after the ">", here's what happens:

>>> email.message_from_bytes(b'To: user@example.com <user@example.com> ')['to']
'user@example.com <user@example.com> '
>>> email.message_from_bytes(b'To: user@example.com <user@example.com> ', policy=default)['to']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py", line 391, in __getitem__
    return self.get(name)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
    return self.header_factory(name, value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 586, in __call__
    return self[name](name, value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 197, in __new__
    cls.parse(value, kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 337, in parse
    kwds['parse_tree'] = address_list = cls.value_parser(value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py", line 328, in value_parser
    address_list, value = parser.get_address_list(value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 2368, in get_address_list
    token, value = get_invalid_mailbox(value, ',')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 2166, in get_invalid_mailbox
    token, value = get_phrase(value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 1770, in get_phrase
    token, value = get_word(value)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py", line 1745, in get_word
    if value[0]=='"':
IndexError: string index out of range
>>> email.message_from_bytes(b'To: user@example.com <user@example.com> ', policy=default).defects
[]

I believe that the preferred behaviour would be to add a defect to the message object during parsing instead of throwing an exception when the invalid header value is accessed.
msg296371 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-19 19:42
Yep, you found an edge case I didn't write a test for.  The defect should get added to the header object during parsing.  (Those are supposed to get copied to the message object...)
msg296397 - (view) Author: Tim Bell (timb07) * Date: 2017-06-20 03:43
I'm using the email package to ingest a firehose of spam; spammers aren't known for following norms or standards, so it's not surprising that I'm discovering lots of edge cases.

I'll supply fixes for what I find where I can, time permitting.
msg352238 - (view) Author: Tim Bell (timb07) * Date: 2019-09-13 03:27
This appears to be the same issue as subsequently reported in #34155.
msg382343 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-12-02 21:25
I don't see the error now, I think this has been fixed.
History
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74886
2021-01-08 14:13:55iritkatrielsetstatus: pending -> closed
resolution: out of date
stage: resolved
2020-12-02 21:25:18iritkatrielsetstatus: open -> pending
nosy: + iritkatriel
messages: + msg382343

2019-09-13 03:27:38timb07setmessages: + msg352238
2018-09-22 19:08:27xtreaksetnosy: + xtreak
2017-08-08 01:35:21ncoghlansetnosy: + ncoghlan
2017-06-20 03:43:15timb07setmessages: + msg296397
2017-06-19 19:42:37r.david.murraysetmessages: + msg296371
2017-06-19 09:25:47timb07create