This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: get_obs_local_part fails to handle empty local part
Type: behavior Stage: patch review
Components: email Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, barry, dxn126, r.david.murray
Priority: normal Keywords: patch

Created on 2020-11-27 12:36 by dxn126, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 24669 open ZackerySpytz, 2021-02-28 16:44
Messages (2)
msg381947 - (view) Author: Dickson Chan (dxn126) Date: 2020-11-27 12:36
parse_message_id in the email module crashes with bogus message-id

Having a Message-ID '<[>' gives me an IndexError: list index out of range

This happens when 
- creating an EmailMessage with the said Message-ID
    msg = EmailMessage()
    msg['Message-ID'] = '<[>'

- accessing the bogus Message-ID through
    msg.items()
or
    msg.get('Message-ID')

this doesn't happen with python 3.6 or 3.7 when MessageIDHeader didn't exist

Lib/email/headerregistry.py line 542

_default_header_map = {
    ....
    'message-id': MessageIDHeader,
    }

-------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id
    token, value = get_dot_atom_text(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text
    raise errors.HeaderParseError("expected atom at a start of "
email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    msg['Message-ID'] = '<[>'
  File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__
    self._headers.append(self.policy.header_store_parse(name, val))
  File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse
    return (name, self.header_factory(name, value))
  File "/usr/lib/python3.8/email/headerregistry.py", line 607, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.8/email/headerregistry.py", line 202, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.8/email/headerregistry.py", line 535, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2126, in parse_message_id
    token, value = get_msg_id(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2073, in get_msg_id
    token, value = get_obs_local_part(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1516, in get_obs_local_part
    if (obs_local_part[0].token_type == 'dot' or
IndexError: list index out of range
-------------------------------------------------------------------------------------------

as you can see in the traceback
get_msg_id() calls get_obs_local_part()
and in get_obs_local_part(), you have this

def get_obs_local_part(value):

    obs_local_part = ObsLocalPart()

    while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
        ...
    if (obs_local_part[0].token_type == 'dot':
        ...

if value does not satisfy the condition in the while loop, 
this gives an IndexError as obs_local_part is empty
(the value in my example is '[>' from the message id '<[>')

shouldn't we have a proper Error or default back to no parsing if parsing fails?
There's no way of bypassing the parser and getting the Message-ID and 
I can't even handle the error with a try catch
msg382153 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-11-30 14:36
Yep, you've found another in a category of bugs that have shown up in the parser: places where there is a missing check for there being any value at all before checking character [0].

In this case, the fix should be to add

    if not obs_local_part:
        return obs_local_part, value

just before the if that is blowing up.
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86650
2021-02-28 16:44:21ZackerySpytzsetkeywords: + patch
nosy: + ZackerySpytz

pull_requests: + pull_request23455
stage: patch review
2020-11-30 14:36:29r.david.murraysetmessages: + msg382153
title: parse_message_id, get_msg_id, get_obs_local_part is poorly written -> get_obs_local_part fails to handle empty local part
2020-11-27 12:36:59dxn126create