This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dxn126
Recipients barry, dxn126, r.david.murray
Date 2020-11-27.12:36:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1606480619.96.0.472096710567.issue42484@roundup.psfhosted.org>
In-reply-to
Content
parse_message_id in the email module crashes with bogus message-id

Having a Message-ID '<[>' gives me an IndexError: list index out of range

This happens when 
- creating an EmailMessage with the said Message-ID
    msg = EmailMessage()
    msg['Message-ID'] = '<[>'

- accessing the bogus Message-ID through
    msg.items()
or
    msg.get('Message-ID')

this doesn't happen with python 3.6 or 3.7 when MessageIDHeader didn't exist

Lib/email/headerregistry.py line 542

_default_header_map = {
    ....
    'message-id': MessageIDHeader,
    }

-------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id
    token, value = get_dot_atom_text(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text
    raise errors.HeaderParseError("expected atom at a start of "
email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    msg['Message-ID'] = '<[>'
  File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__
    self._headers.append(self.policy.header_store_parse(name, val))
  File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse
    return (name, self.header_factory(name, value))
  File "/usr/lib/python3.8/email/headerregistry.py", line 607, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.8/email/headerregistry.py", line 202, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.8/email/headerregistry.py", line 535, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2126, in parse_message_id
    token, value = get_msg_id(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2073, in get_msg_id
    token, value = get_obs_local_part(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1516, in get_obs_local_part
    if (obs_local_part[0].token_type == 'dot' or
IndexError: list index out of range
-------------------------------------------------------------------------------------------

as you can see in the traceback
get_msg_id() calls get_obs_local_part()
and in get_obs_local_part(), you have this

def get_obs_local_part(value):

    obs_local_part = ObsLocalPart()

    while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
        ...
    if (obs_local_part[0].token_type == 'dot':
        ...

if value does not satisfy the condition in the while loop, 
this gives an IndexError as obs_local_part is empty
(the value in my example is '[>' from the message id '<[>')

shouldn't we have a proper Error or default back to no parsing if parsing fails?
There's no way of bypassing the parser and getting the Message-ID and 
I can't even handle the error with a try catch
History
Date User Action Args
2020-11-27 12:37:00dxn126setrecipients: + dxn126, barry, r.david.murray
2020-11-27 12:36:59dxn126setmessageid: <1606480619.96.0.472096710567.issue42484@roundup.psfhosted.org>
2020-11-27 12:36:59dxn126linkissue42484 messages
2020-11-27 12:36:58dxn126create