This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: httplib client and statusline
Type: behavior Stage: resolved
Components: Versions: Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: karlcow, orsenthil, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2012-08-28 20:31 by karlcow, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg169293 - (view) Author: karl (karlcow) * Date: 2012-08-28 20:31
The current parsing of HTTP status line seems strange with regards to its definition in HTTP.

http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l307

Currently the code is 
version, status, reason = line.split(None, 2)

>>> status1 = "HTTP/1.1 200 OK"
>>> status2 = "HTTP/1.1 200 "
>>> status3 = "HTTP/1.1 200"
>>> status1.split(None, 2)
['HTTP/1.1', '200', 'OK']
>>> status2.split(None, 2)
['HTTP/1.1', '200']
>>> status3.split(None, 2)
['HTTP/1.1', '200']

According to the production rules of HTTP/1.1 bis only status1 and status2 are valid.

  status-line = HTTP-version SP status-code SP reason-phrase CRLF
  — http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-20#section-3.1.2

with reason-phrase  = *( HTAB / SP / VCHAR / obs-text ) aka 0 or more characters.

I'm also not sure what are the expected ValueError with additional parsing rules which seems even more bogus.

First modification should be

>>> status1.split(' ', 2)
['HTTP/1.1', '200', 'OK']
>>> status2.split(' ', 2)
['HTTP/1.1', '200', '']

Which would be correct for the first two, with an empty reason-phrase
The third one is still no good.

>>> status3.split(' ', 2)
['HTTP/1.1', '200']

An additional check could be done with 

len(status.split(' ', 2)) == 3

Will return False in the third case.

Do you want me to create a patch and a test for it?
msg169294 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-28 20:39
Could you explain the error you are seeing in more detail first?  You are talk about parsing and fixes here, but I'm not clear on what the actual bug is you are reporting.
msg169297 - (view) Author: karl (karlcow) * Date: 2012-08-28 20:59
ok.

status lines 1 and 2 are valid. 
the third one is invalid and should trigger a 

raise BadStatusLine(line)

The code at line 318 is bogus as it will parse happily the third line without raising an exception.
http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l318
msg169298 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-28 21:01
Why should it raise an error?  The postel principle suggests that we should treat it as equivalent to the second line.
msg169299 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-28 21:03
I would point out that the goal of the http module is to provide interoperability with real-life servers, rather than be a strict implementation of RFCs. So, starting to raise errors for "HTTP/1.1 200" while "HTTP/1.1 200 " remains ok might not be the best idea.
msg169305 - (view) Author: karl (karlcow) * Date: 2012-08-28 21:28
Fair enough, it could be a warning when

* more than one space in between http version and status code
* if there is a missing space after the status code


I'm not advocating for being strict only. I'm advocating for giving the tools to developer to assess that things are right and choose or not to ignore and having to avoid to patch the libraries or rewrite modules when you create code which needs to be strict for specifically validating responses and requests. :)

ps: I haven't checked yet if the server counter part of httplib was strict in the production rule.
msg169453 - (view) Author: karl (karlcow) * Date: 2012-08-30 13:59
So what do we do with it? 
Do I created a patch or do we close that bug? :)
No hard feelings about it.
msg169454 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-30 14:02
Well, I'd like to hear Senthil's opinion, since he's the main http / urllib maintainer these days.
msg169455 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-30 14:06
I'm inclined to think it isn't worth the effort, myself.  I think a "validating" client would be a valuable tool, but that that isn't what the stdlib is focused on.  But yes, let's hear Senthil's opinion.

(That said, I am in fact adding a lot of validation capabilities to the email package...but in general those are only a matter of exposing the information about parsing failures that the code then has to recover from, rather than adding extra parsing to detect non-compliance.)
msg227504 - (view) Author: karl (karlcow) * Date: 2014-09-25 04:12
Let's close this.

>>> "HTTP/1.1    301 ".split(None, 2)
['HTTP/1.1', '301']
>>> "HTTP/1.1    301 ".split(' ', 2)
['HTTP/1.1', '', '  301 ']

I think it would be nice to have a way to warn without stopping, but the last comment from r.david.murray makes sense too. :)
msg227587 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-09-26 07:04
Sorry that I did not get involved earlier.

It is difficult to prove any problem with the current behavior and it is rightly closed. The issue which was originally raised seems to me a cosmetic one, which won't get exhibited as well.

Here is  simple test case and the output with the current behavior.

# testcases.py

testcases = ["HTTP/1.1 200", "HTTP/1.1 200 OK", "HTTP/1.1 200  ", "HTTP/1.1   200"]
for tc in testcases:
    try:
        version, status, reason = tc.split(None, 2)
        print "%s (%s,%s,%s)" % (tc, version, status, reason)
    except ValueError:
        version, status = tc.split(None, 1)
        print "%s (%s, %s)" % (tc, version, status)


$ python testcases.py
HTTP/1.1 200 (HTTP/1.1, 200)
HTTP/1.1 200 OK (HTTP/1.1,200,OK)
HTTP/1.1 200   (HTTP/1.1, 200  )
HTTP/1.1   200 (HTTP/1.1, 200)

The problem is the status code (str at the moment) has a trailing spaces. 
And now, the status code is not used as string. Just after the parsing, the status is converted to integer, Line 337: status = int(status) and rest of urllib and http/client's behavior use status as int rather than as string.

Thanks!
History
Date User Action Args
2022-04-11 14:57:35adminsetgithub: 60003
2014-09-26 07:04:11orsenthilsettype: enhancement -> behavior
messages: + msg227587
2014-09-25 14:23:43r.david.murraysetstage: resolved
2014-09-25 04:12:40karlcowsetstatus: open -> closed
resolution: not a bug
messages: + msg227504
2012-08-30 14:06:31r.david.murraysetmessages: + msg169455
versions: + Python 3.2, - Python 3.3
2012-08-30 14:02:30pitrousetmessages: + msg169454
versions: + Python 3.3, - Python 3.2
2012-08-30 13:59:48karlcowsetmessages: + msg169453
2012-08-28 21:28:10karlcowsetmessages: + msg169305
2012-08-28 21:03:11pitrousetnosy: + orsenthil, pitrou
messages: + msg169299
2012-08-28 21:01:06r.david.murraysetmessages: + msg169298
2012-08-28 20:59:21karlcowsetmessages: + msg169297
2012-08-28 20:39:16r.david.murraysetnosy: + r.david.murray
messages: + msg169294
2012-08-28 20:31:30karlcowcreate