classification
Title: IndexError in get_bare_quoted_string
Type: behavior Stage: resolved
Components: email Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, matrixise, maxking, r.david.murray, terry.reedy
Priority: normal Keywords: patch

Created on 2019-07-03 08:19 by maxking, last changed 2019-08-01 12:28 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
fix-indexerror.patch maxking, 2019-07-09 18:11
Pull Requests
URL Status Linked Edit
PR 14813 merged maxking, 2019-07-17 13:58
PR 14820 merged miss-islington, 2019-07-17 16:51
PR 14819 merged miss-islington, 2019-07-17 16:51
Messages (7)
msg347190 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-07-03 08:19
from email.parser import BytesParser, Parser
from email.policy import default

payload = 'Content-Type:x;\x1b*="\'G\'\\"""""'
msg = Parser(policy=default).parsestr(payload)
print(msg.get('content-type'))


When trying to review PR for BPO 37461, I found another bug where an IndexError
is raised if there aren't closing quote characters in the input message:

Suggested patch:

@@ -1191,7 +1192,7 @@ def get_bare_quoted_string(value):
             "expected '\"' but found '{}'".format(value))
     bare_quoted_string = BareQuotedString()
     value = value[1:]
-    if value[0] == '"':
+    if value and value[0] == '"':
         token, value = get_qcontent(value)
         bare_quoted_string.append(token)
     while value and value[0] != '"':
msg347454 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-07-06 22:46
An Python exception is not a crash; a crash is the program stopping without an exception and proper cleanup.

If s is a string (rather than, for instance, None),
s and (s[0] == char) is equivalent to s[0:1] == char
msg347529 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-07-09 09:02
Thanks for the explanation Terry!

In this case, value becomes None (I think), which causes the IndexError.
msg347565 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-07-09 16:32
To avoid such questions, bug reports should contain exception messages and usually at least some of the tracebacks.

>>> ''[0]
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    ''[0]
IndexError: string index out of range
>>> x=None
>>> x[0]
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    x[0]
TypeError: 'NoneType' object is not subscriptable

IndexError should mean object was indexable.
Operations on None should give TypeError.
msg347566 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-07-09 16:47
Also, quotes should be attributed to a file and patch snipped should indicate the target.  

As for the bug, the author(s) of the expressions "value[1:]" and "value[0]" presumably *expected* value to initially have length 2 so that it would be be non-empty after clipping.  In the absence of additional information, it is possible that the bug is in the unquoted code that produced value.  This is potentially true whenever a function or expession raises.
msg347572 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-07-09 18:11
I just wanted to report before I forgot and hence missed some details, turns out the bug report was slightly wrong too. The testcase I provided wasn't right.

Here is the right test case to reproduce the exception with master:

# bpo_37491.py
from email.parser import BytesParser, Parser
from email.policy import default

payload = 'Content-Type:"'
msg = Parser(policy=default).parsestr(payload)
print(msg.get('content-type'))


$ ./python bpo_37491.py                                                                                                                                                              
Traceback (most recent call last):
  File "bpo_37491.py", line 5, in <module>
    msg = Parser(policy=default).parsestr(payload)
  File "/home/maxking/Documents/cpython/Lib/email/parser.py", line 68, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/home/maxking/Documents/cpython/Lib/email/parser.py", line 58, in parse
    return feedparser.close()
  File "/home/maxking/Documents/cpython/Lib/email/feedparser.py", line 187, in close
    self._call_parse()
  File "/home/maxking/Documents/cpython/Lib/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/home/maxking/Documents/cpython/Lib/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/home/maxking/Documents/cpython/Lib/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/home/maxking/Documents/cpython/Lib/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/home/maxking/Documents/cpython/Lib/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/home/maxking/Documents/cpython/Lib/email/headerregistry.py", line 602, in __call__
    return self[name](name, value)
  File "/home/maxking/Documents/cpython/Lib/email/headerregistry.py", line 197, in __new__
    cls.parse(value, kwds)
  File "/home/maxking/Documents/cpython/Lib/email/headerregistry.py", line 447, in parse
    kwds['decoded'] = str(parse_tree)
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 126, in __str__
    return ''.join(str(x) for x in self)
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 126, in <genexpr>
    return ''.join(str(x) for x in self)
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 796, in __str__
    for name, value in self.params:
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 770, in params
    value = param.param_value
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 679, in param_value
    return token.stripped_value
  File "/home/maxking/Documents/cpython/Lib/email/_header_value_parser.py", line 710, in stripped_value
    token = self[0]
IndexError: list index out of range


This time I attached a correct patch instead of copy-pasting only the diff :)

About IndexError, I agree, it must be an empty string value and not a None value.
msg348842 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-08-01 06:35
@barry

the 3 PR have been merged, do you think we could close this issue?

Thank you
History
Date User Action Args
2019-08-01 12:28:25r.david.murraysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-08-01 06:35:30matrixisesetnosy: + matrixise
messages: + msg348842
2019-07-17 16:51:51miss-islingtonsetpull_requests: + pull_request14614
2019-07-17 16:51:03miss-islingtonsetpull_requests: + pull_request14613
2019-07-17 13:58:03maxkingsetstage: patch review
pull_requests: + pull_request14607
2019-07-09 18:11:10maxkingsetfiles: + fix-indexerror.patch
keywords: + patch
messages: + msg347572
2019-07-09 16:47:17terry.reedysetmessages: + msg347566
2019-07-09 16:32:54terry.reedysetmessages: + msg347565
2019-07-09 09:02:23maxkingsetmessages: + msg347529
2019-07-06 22:46:39terry.reedysettype: crash -> behavior

messages: + msg347454
nosy: + terry.reedy
2019-07-03 08:19:31maxkingcreate