classification
Title: urlib{, 2} returns a pair of integers as the content-length value
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Billy.Saelim, jwhisnant, orsenthil, rhettinger, santoso.wijaya
Priority: normal Keywords: patch

Created on 2011-03-23 16:43 by Billy.Saelim, last changed 2011-04-15 09:26 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
issue11652.patch santoso.wijaya, 2011-03-24 00:56 review
Messages (9)
msg131894 - (view) Author: Billy Saelim (Billy.Saelim) Date: 2011-03-23 16:43
urlopen does not always return a single value for 'content-length'.  For example:


>>> import urllib2
>>> request = 'http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.1.11.zip'
>>> fp = urllib2.urlopen(request)
>>> fp.info().dict
{'content-length': '289519, 289519', 'x-varnish': '929586024', 'via': '1.1 varnish', 'age': '0', 'expires': 'Fri, 25 Mar 2011 14:36:43 GMT', 'server': 'Apache/2.2.3 (CentOS)', 'last-modified': 'Sat, 07 Feb 2009 19:15:15 GMT', 'connection': 'close', 'etag': '"46aef-46258f510b6c0"', 'date': 'Wed, 23 Mar 2011 14:36:43 GMT', 'content-type': 'application/zip'}
msg131897 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-23 17:33
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> request = 'http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.1.11.zip'
>>> fp = urllib2.urlopen(request)
>>> fp.info()['content-length']
'289519, 289519'
>>>

Not reproducible on Python 3.2+ (presumably 3.x, as well).
msg131898 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-23 17:47
Interesting, the Content-Length header was sent twice:

HTTP/1.1 200 OK
Server: Apache/2.2.3 (CentOS)
Last-Modified: Sat, 07 Feb 2009 19:15:15 GMT
ETag: "46aef-46258f510b6c0"
Content-Length: 289519
Expires: Fri, 25 Mar 2011 17:32:49 GMT
Content-Type: application/zip
Content-Length: 289519
Date: Wed, 23 Mar 2011 17:32:49 GMT
X-Varnish: 930137777
Age: 0
Via: 1.1 varnish
msg131900 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-23 18:12
This affects urllib, as well:

C:\Users\santa>python
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> request = 'http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.1.11.z
ip'
>>> fp = urllib.urlopen(request)
>>> fp.headers['content-length']
'289519, 289519'
msg131935 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-23 22:58
Yes, interesting that Content-Length is returned as a comma separated value of ints. 
Normally, this behavior is observed for other headers which can have multiple values and urllib appends the subsequent values of the header for e.g. Content-Type header.

curl seems to have got it right in this case, where it returns only one Content-Length header for the above URL. urllib needs to modify to deal with this scenario - If there are multiple content-lengths returned by the server, just use the last one if we are mimic curl behavior. (AFAIK, the standard is to return them comma separated, but in this case, it wont be useful).
msg131946 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-24 00:56
One way I can think of is by resorting so some list of exceptions. Not quite elegant, but it works...
msg131950 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-24 01:28
Yes, this is good enough for a non-standard behavior.

However, I would be curious to know what curl did. Did it do something
on based on the value or sequence in which it was obtained or just
over-wrote it.
msg133034 - (view) Author: James Whisnant (jwhisnant) Date: 2011-04-05 14:31
Varnish on the sourceforge server has been upgraded and/or reconfigured (yesterday) to fix the issue that was happening with this file (and others).

Just an FYI that you will no longer be able to re-create the triggering error.

'content-length': '289519',
'via': '1.1 varnish'
msg133798 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-15 09:26
It is better to close this issue as it was a Server Error. Standard just says that when there two headers with different values, combine them comma separated as urllib2 does.  Making special case exception for 'Content-Length' header when the server is at fault would be bad idea. We will not know which value to choose from if the values are different.

Closing this bug as Invalid.


>>> import urllib2
>>> req = urllib2.urlopen('http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.1.11.zip')
>>> req.info()['content-length']
'289519'
History
Date User Action Args
2011-04-15 09:26:31orsenthilsetstatus: open -> closed
resolution: not a bug
messages: + msg133798

stage: resolved
2011-04-05 14:31:04jwhisnantsetnosy: + jwhisnant
messages: + msg133034
2011-03-24 01:28:55orsenthilsetmessages: + msg131950
title: urlib{,2} returns a pair of integers as the content-length value -> urlib{, 2} returns a pair of integers as the content-length value
2011-03-24 00:56:14santoso.wijayasetfiles: + issue11652.patch
keywords: + patch
messages: + msg131946
2011-03-23 22:58:51orsenthilsetassignee: orsenthil

messages: + msg131935
nosy: + orsenthil
2011-03-23 18:12:34santoso.wijayasetmessages: + msg131900
title: urlib2 returns a pair of integers as the content-length value -> urlib{,2} returns a pair of integers as the content-length value
2011-03-23 17:47:25rhettingersetnosy: + rhettinger
messages: + msg131898
2011-03-23 17:33:29santoso.wijayasetnosy: + santoso.wijaya

messages: + msg131897
versions: + Python 2.7
2011-03-23 16:43:47Billy.Saelimcreate