classification
Title: String index out of range in get_group(), email/_header_value_parser.py
Type: behavior Stage: patch review
Components: email Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Cacadril, barry, corona10, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2018-05-13 00:50 by Cacadril, last changed 2018-06-12 08:18 by corona10.

Files
File name Uploaded Description Edit
email.eml Cacadril, 2018-05-20 12:12 email with a group of recipient but missing ";"
Pull Requests
URL Status Linked Edit
PR 7484 open corona10, 2018-06-07 15:34
Messages (6)
msg316440 - (view) Author: Enrique Perez-Terron (Cacadril) * Date: 2018-05-13 00:50
When address group is missing final ';', 'value' will be an empty string. I suggest the following patch

$ diff -u _save_header_value_parser.py _header_value_parser.py
--- _save_header_value_parser.py        2018-03-14 01:07:54.000000000 +0100
+++ _header_value_parser.py     2018-05-13 02:17:13.830053600 +0200
@@ -1876,7 +1876,7 @@
     if not value:
         group.defects.append(errors.InvalidHeaderDefect(
             "end of header in group"))
-    if value[0] != ';':
+    elif value[0] != ';':
         raise errors.HeaderParseError(
             "expected ';' at end of group but found {}".format(value))
     group.append(ValueTerminal(';', 'group-terminator'))
msg316538 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-14 16:53
The fix is mostly likely correct, so we need a PR for this that includes tests.
msg317165 - (view) Author: Enrique Perez-Terron (Cacadril) * Date: 2018-05-20 12:12
Unsure how to issue a "PR" (Problem Report?) with a test case.

Here is my best effort:

Create a file "email.eml" in the current directory, as attached.
(The contents were lifted from RFC2822 section A.1.3, but I deleted the ";" at the end of the "To" header. The file has CRLF line endings.)

Then run the following test program (It appears that I can only attach one file to this ).

$ cat test-bug.py
from email.policy import default
import email

with open('email.eml', 'rb') as f:
    msg = email.message_from_binary_file(f, policy=default)
    toheader = msg['To']
    for addr in toheader.addresses:
        print(addr)

#----------------------------------------------------
# Output without the fix:

$ python3.6.5 test-bug.py
Traceback (most recent call last):
  File "test-bug.py", line 6, in <module>
    toheader = msg['To']
  File "C:\Program Files\Python36\lib\email\message.py", line 391, in __getitem__
    return self.get(name)
  File "C:\Program Files\Python36\lib\email\message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "C:\Program Files\Python36\lib\email\policy.py", line 162, in header_fetch_parse
    return self.header_factory(name, value)
  File "C:\Program Files\Python36\lib\email\headerregistry.py", line 589, in __call__
    return self[name](name, value)
  File "C:\Program Files\Python36\lib\email\headerregistry.py", line 197, in __new__
    cls.parse(value, kwds)
  File "C:\Program Files\Python36\lib\email\headerregistry.py", line 340, in parse
    kwds['parse_tree'] = address_list = cls.value_parser(value)
  File "C:\Program Files\Python36\lib\email\headerregistry.py", line 331, in value_parser
    address_list, value = parser.get_address_list(value)
  File "C:\Program Files\Python36\lib\email\_header_value_parser.py", line 1931, in get_address_list
    token, value = get_address(value)
  File "C:\Program Files\Python36\lib\email\_header_value_parser.py", line 1908, in get_address
    token, value = get_group(value)
  File "C:\Program Files\Python36\lib\email\_header_value_parser.py", line 1879, in get_group
    if value[0] != ';':
IndexError: string index out of range

#-----------------------------------------------------
# Output with the fix:

$ test-bug.py
Chris Jones <c@a.test>
joe@where.test
John <jdoe@one.test>
msg318049 - (view) Author: Dong-hee Na (corona10) * Date: 2018-05-29 15:56
@Cacadril

Please send a pull request(PR) through CPython GitHub(https://github.com/python/cpython)

It will be a great experience :)
msg319007 - (view) Author: Dong-hee Na (corona10) * Date: 2018-06-08 02:14
@r.david.murray

Please take a look PR 7484 :)
msg319368 - (view) Author: Dong-hee Na (corona10) * Date: 2018-06-12 08:18
@r.david.murray
Can I get a review for PR 7484?
History
Date User Action Args
2018-06-12 08:18:19corona10setmessages: + msg319368
2018-06-08 02:14:47corona10setmessages: + msg319007
2018-06-07 15:34:51corona10setkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request7109
2018-05-29 15:56:15corona10setnosy: + corona10
messages: + msg318049
2018-05-20 12:12:18Cacadrilsetfiles: + email.eml

messages: + msg317165
2018-05-14 16:53:34r.david.murraysettype: crash -> behavior
2018-05-14 16:53:25r.david.murraysetkeywords: + easy

stage: test needed
messages: + msg316538
versions: + Python 3.7, Python 3.8
2018-05-13 00:50:02Cacadrilcreate