This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Handle empty port after port delimiter in httplib
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: eric.araujo, georg.brandl, lukasz.langa, orsenthil, python-dev, sligocki
Priority: low Keywords: patch

Created on 2011-01-07 19:14 by sligocki, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue.10860.patch sligocki, 2011-01-08 00:13
Messages (15)
msg125687 - (view) Author: Shawn Ligocki (sligocki) Date: 2011-01-07 19:14
urllib2 crashes with stack trace on legal URL http://118114.cn

Transcript:

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> urllib2.urlopen("http://118114.cn")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 397, in open
    response = meth(req, response)
  File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.6/urllib2.py", line 429, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 605, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1107, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "/usr/lib/python2.6/httplib.py", line 657, in __init__
    self._set_hostport(host, port)
  File "/usr/lib/python2.6/httplib.py", line 682, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''
>>> 


I think the problem is that "http://118114.cn" says it redirects to "http://www.118114.cn:", but it seems like urllib2 should be able to deal with that or at least report back a more useful error message.

$ nc 118114.cn 80
GET / HTTP/1.1
Host: 118114.cn   
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.04 (lucid) Firefox/3.6.13

HTTP/1.1 301 Moved Permanently
Server: nginx/0.7.64
Date: Fri, 07 Jan 2011 19:06:32 GMT
Content-Type: text/html
Content-Length: 185
Connection: keep-alive
Keep-Alive: timeout=60
Location: http://www.118114.cn:

<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/0.7.64</center>
</body>
</html>
msg125691 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-07 19:22
Thanks for the report.  Can you test it with current versions (2.7, 3.1 or 3.2)?
msg125701 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-07 20:01
The redirection is a red hearing, the error is clearly that an empty port after a port delimiter (like in http://host:) breaks httplib.  Do you want to work on a patch?  Helpful guidelines are found at http://www.python.org/dev/patches/
msg125704 - (view) Author: Shawn Ligocki (sligocki) Date: 2011-01-07 20:27
Sure, I can work on a patch.

Should an empty port default to 80? In other words does "http://foo.com/" == "http://foo.com:/"?
msg125706 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-07 20:33
Yes.
msg125710 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-07 21:04
Except if it's an HTTPS URL :)
msg125733 - (view) Author: Shawn Ligocki (sligocki) Date: 2011-01-08 00:13
Here's a patch for 2.7 (from the hg checkout http://code.python.org/hg/branches/release2.7-maint/)

How does it look? Apparently there was already a testcase for "www.python.org:" failing!
msg125760 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-01-08 08:11
On Fri, Jan 07, 2011 at 07:14:30PM +0000, Shawn Ligocki wrote:
> 
> I think the problem is that "http://118114.cn" says it redirects to
> "http://www.118114.cn:", but it seems like urllib2 should be able to
> deal with that or at least report back a more useful error message.

I think, this is improper at the Server End to redirect it to URL
where ':' is provided and Port is missing.

Any client, which does a transparent redirection, can be expected to fail.

senthil@rubuntu:~$ curl -L http://118114.cn
curl: (6) Couldn't resolve host 'www.118114.cn:'

The Redirected URL is an Invalid Syntax and I don't think we should
fix by providing a default port at urllib2 end. What can be done is
fail with timeout or raise an Exception for an Invalid URL.
msg125763 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-01-08 08:25
The title got reset by my previous response. Changing it again.

I vote for closing this report as Invalid as I see that the error message which is raised is proper and meaningful.

httplib.InvalidURL: nonnumeric port: ''
msg127868 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-04 04:16
From RFC 3986, section 6.2.3 “Scheme-Based Normalization”:

   The syntax and semantics of URIs vary from scheme to scheme, as
   described by the defining specification for each scheme.
   Implementations may use scheme-specific rules, at further processing
   cost, to reduce the probability of false negatives.  For example,
   because the "http" scheme makes use of an authority component, has a
   default port of "80", and defines an empty path to be equivalent to
   "/", the following four URIs are equivalent:

      http://example.com
      http://example.com/
      http://example.com:/
      http://example.com:80/

IOW, the empty string is not an invalid port.  The patch fixes that.  It includes tests but lacks a doc update.  I think it works for https URIs too, but I’d like a test to make sure.
msg145815 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-18 15:17
New changeset a3e48273dce3 by Łukasz Langa in branch '2.7':
Fixes #10860: Handle empty port after port delimiter in httplib
http://hg.python.org/cpython/rev/a3e48273dce3
msg145879 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-19 00:07
New changeset 6ac59218c049 by Łukasz Langa in branch '3.2':
Fixes #10860: Handle empty port after port delimiter in httplib
http://hg.python.org/cpython/rev/6ac59218c049

New changeset 18dc3811f2b8 by Łukasz Langa in branch 'default':
Merged fix for #10860 from 3.2
http://hg.python.org/cpython/rev/18dc3811f2b8
msg145880 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2011-10-19 00:09
Patch merged. Thanks, Shawn.
msg145885 - (view) Author: Shawn Ligocki (sligocki) Date: 2011-10-19 02:06
Great! Glad it landed :)
msg146449 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-26 18:31
New changeset e0499b2b28aa by Petri Lehtinen in branch '2.7':
Issue #10860: Skip the new test if HTTPS is not available
http://hg.python.org/cpython/rev/e0499b2b28aa

New changeset a3939c2f6727 by Petri Lehtinen in branch '3.2':
Issue #10860: Skip the new test if HTTPS is not available
http://hg.python.org/cpython/rev/a3939c2f6727

New changeset 2dd106799aa9 by Petri Lehtinen in branch 'default':
Issue #10860: Skip the new test if HTTPS is not available
http://hg.python.org/cpython/rev/2dd106799aa9
History
Date User Action Args
2022-04-11 14:57:11adminsetgithub: 55069
2011-10-26 18:31:04python-devsetmessages: + msg146449
2011-10-19 02:06:06sligockisetmessages: + msg145885
2011-10-19 00:09:35lukasz.langasetstatus: open -> closed

versions: - Python 3.1
nosy: + lukasz.langa

messages: + msg145880
resolution: fixed
stage: patch review -> resolved
2011-10-19 00:07:17python-devsetmessages: + msg145879
2011-10-18 15:17:45python-devsetnosy: + python-dev
messages: + msg145815
2011-03-18 02:10:09orsenthilsetassignee: orsenthil
nosy: georg.brandl, orsenthil, sligocki, eric.araujo
2011-02-04 04:16:44eric.araujosetstatus: pending -> open
versions: + Python 3.3
nosy: georg.brandl, orsenthil, sligocki, eric.araujo
messages: + msg127868

resolution: not a bug -> (no value)
stage: needs patch -> patch review
2011-01-08 08:25:54orsenthilsetstatus: open -> pending

title: urllib2 crashes on valid URL -> Handle empty port after port delimiter in httplib
nosy: georg.brandl, orsenthil, sligocki, eric.araujo
messages: + msg125763
priority: normal -> low
resolution: not a bug
2011-01-08 08:11:38orsenthilsetnosy: georg.brandl, orsenthil, sligocki, eric.araujo
messages: + msg125760
title: Handle empty port after port delimiter in httplib -> urllib2 crashes on valid URL
2011-01-08 00:13:17sligockisetfiles: + issue.10860.patch

messages: + msg125733
keywords: + patch
nosy: georg.brandl, orsenthil, sligocki, eric.araujo
2011-01-07 21:04:44georg.brandlsetnosy: + georg.brandl
messages: + msg125710
2011-01-07 20:33:55eric.araujosetnosy: orsenthil, sligocki, eric.araujo
messages: + msg125706
2011-01-07 20:27:06sligockisetnosy: orsenthil, sligocki, eric.araujo
messages: + msg125704
2011-01-07 20:01:31eric.araujosettitle: urllib2 crashes on valid URL -> Handle empty port after port delimiter in httplib
nosy: orsenthil, sligocki, eric.araujo
messages: + msg125701

type: crash -> behavior
stage: needs patch
2011-01-07 19:22:56eric.araujosetnosy: + eric.araujo, orsenthil

messages: + msg125691
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2011-01-07 19:14:28sligockicreate