classification
Title: IndexError thrown on email.message.Message.get
Type: behavior Stage: patch review
Components: email Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, r.david.murray, uckelman, xiang.zhang
Priority: normal Keywords: patch

Created on 2017-02-01 13:53 by uckelman, last changed 2018-05-16 14:29 by TThurk360.

Files
File name Uploaded Description Edit
29412.patch uckelman, 2017-02-07 16:04 patch
Pull Requests
URL Status Linked Edit
PR 6907 open TThurk360, 2018-05-16 14:22
Messages (7)
msg286631 - (view) Author: Joel Uckelman (uckelman) * Date: 2017-02-01 13:53
Test case:

  import email
  import email.policy
 
  txt = '''From: juckelman@strozfriedberg.co.uk
  To: (Recipient list suppressed)
  Date: Thu, 22 Aug 2013 04:13:02 +0000
  Subject: ADSF-1082
 
  Hey!
  '''
 
  msg = email.message_from_string(txt)
  msg.get('to')
  msg = email.message_from_string(txt, policy=email.policy.default)
  msg.get('to')

The second msg.get() throws an IndexError:

  Traceback (most recent call last):
    File "test.py", line 16, in <module>
      print(msg.get('to'))    # throws IndexError
    File "/usr/lib64/python3.5/email/message.py", line 472, in get
      return self.policy.header_fetch_parse(k, v)
    File "/usr/lib64/python3.5/email/policy.py", line 153, in header_fetch_parse
      return self.header_factory(name, ''.join(value.splitlines()))
    File "/usr/lib64/python3.5/email/headerregistry.py", line 586, in __call__
      return self[name](name, value)
    File "/usr/lib64/python3.5/email/headerregistry.py", line 197, in __new__
      cls.parse(value, kwds)
    File "/usr/lib64/python3.5/email/headerregistry.py", line 337, in parse
      kwds['parse_tree'] = address_list = cls.value_parser(value)
    File "/usr/lib64/python3.5/email/headerregistry.py", line 328, in value_parser
      address_list, value = parser.get_address_list(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 2336, in get_address_list
      token, value = get_address(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 2313, in get_address
      token, value = get_group(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 2269, in get_group
      token, value = get_display_name(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 2095, in get_display_name
      token, value = get_phrase(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 1770, in get_phrase
      token, value = get_word(value)
    File "/usr/lib64/python3.5/email/_header_value_parser.py", line 1745, in get_word
      if value[0]=='"':
  IndexError: string index out of range

The docs say that email.policy.default has raise_on_defect set to False, hence parse errors ought to be reported via EmailMessage.defects, not by throwing an exception.
msg286633 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-01 14:09
Does the patch from issue 27931 fix your problem as well?  I haven't looked closely enough to see if I think it should, I'm just hoping :)
msg286635 - (view) Author: Joel Uckelman (uckelman) * Date: 2017-02-01 14:17
No dice. I get the same exception with issue27931_v2.patch. I briefly looked at the other two, and don't expect those will help, either.
msg286845 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-02-03 06:45
This seems not related to #27931.

The problem is that the receiver's content is only CFWS. It's just like it's empty and for the default policy, it checks `value[0] == '"'` in `get_word`.
msg286867 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-03 13:49
I'm really short on time to even review patches these days, but I'll see if I can pry any loose if someone wants to propose a patch.
msg287232 - (view) Author: Joel Uckelman (uckelman) * Date: 2017-02-07 13:10
I'm working on a patch now.
msg287242 - (view) Author: Joel Uckelman (uckelman) * Date: 2017-02-07 16:04
Here's a patch, complete with tests.

If the value is all CFWS, then get_cfws(value)[1], which is what's left after the CFWS is extracted, is the empty string---which is why value[0] throws in this case.
History
Date User Action Args
2018-05-16 14:29:07TThurk360setpull_requests: - pull_request6509
2018-05-16 14:22:53TThurk360setpull_requests: + pull_request6575
2018-05-14 19:41:00TThurk360setstage: patch review
pull_requests: + pull_request6509
2017-02-07 16:04:35uckelmansetfiles: + 29412.patch
keywords: + patch
messages: + msg287242
2017-02-07 13:10:26uckelmansetmessages: + msg287232
2017-02-03 13:49:25r.david.murraysetmessages: + msg286867
versions: + Python 3.6, Python 3.7
2017-02-03 06:45:08xiang.zhangsetnosy: + xiang.zhang
messages: + msg286845
2017-02-01 14:44:15uckelmansettitle: IndexError thrown on email.message.EmailMessage.get -> IndexError thrown on email.message.Message.get
2017-02-01 14:17:25uckelmansetmessages: + msg286635
2017-02-01 14:09:35r.david.murraysetmessages: + msg286633
2017-02-01 13:53:52uckelmansettitle: IndexError thrown on email.message.M -> IndexError thrown on email.message.EmailMessage.get
2017-02-01 13:53:21uckelmancreate