parse_message_id in the email module crashes with bogus message-id
Having a Message-ID '<[>' gives me an IndexError: list index out of range
This happens when
- creating an EmailMessage with the said Message-ID
msg = EmailMessage()
msg['Message-ID'] = '<[>'
- accessing the bogus Message-ID through
msg.items()
or
msg.get('Message-ID')
this doesn't happen with python 3.6 or 3.7 when MessageIDHeader didn't exist
Lib/email/headerregistry.py line 542
_default_header_map = {
....
'message-id': MessageIDHeader,
}
-------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id
token, value = get_dot_atom_text(value)
File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text
raise errors.HeaderParseError("expected atom at a start of "
email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 4, in <module>
msg['Message-ID'] = '<[>'
File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__
self._headers.append(self.policy.header_store_parse(name, val))
File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse
return (name, self.header_factory(name, value))
File "/usr/lib/python3.8/email/headerregistry.py", line 607, in __call__
return self[name](name, value)
File "/usr/lib/python3.8/email/headerregistry.py", line 202, in __new__
cls.parse(value, kwds)
File "/usr/lib/python3.8/email/headerregistry.py", line 535, in parse
kwds['parse_tree'] = parse_tree = cls.value_parser(value)
File "/usr/lib/python3.8/email/_header_value_parser.py", line 2126, in parse_message_id
token, value = get_msg_id(value)
File "/usr/lib/python3.8/email/_header_value_parser.py", line 2073, in get_msg_id
token, value = get_obs_local_part(value)
File "/usr/lib/python3.8/email/_header_value_parser.py", line 1516, in get_obs_local_part
if (obs_local_part[0].token_type == 'dot' or
IndexError: list index out of range
-------------------------------------------------------------------------------------------
as you can see in the traceback
get_msg_id() calls get_obs_local_part()
and in get_obs_local_part(), you have this
def get_obs_local_part(value):
obs_local_part = ObsLocalPart()
while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
...
if (obs_local_part[0].token_type == 'dot':
...
if value does not satisfy the condition in the while loop,
this gives an IndexError as obs_local_part is empty
(the value in my example is '[>' from the message id '<[>')
shouldn't we have a proper Error or default back to no parsing if parsing fails?
There's no way of bypassing the parser and getting the Message-ID and
I can't even handle the error with a try catch
|