New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023
Comments
A file-like object u returned by the urlopen() function in both Python ::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>>
::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>> So far, so good. HTTPMessage is defined in two different modules, but Two major problems:
:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__',
'__iter__', '__len__', '__module__', '__setitem__', '__str__',
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get',
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate',
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader',
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist',
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers',
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype',
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders',
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders',
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']
:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', '_charset',
'_default_type', '_get_params_preserve', '_headers', '_payload',
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects',
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary',
'get_charset', 'get_charsets', 'get_content_charset',
'get_content_maintype', 'get_content_subtype', 'get_content_type',
'get_default_type', 'get_filename', 'get_param', 'get_params',
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart',
'items', 'keys', 'preamble', 'replace_header', 'set_boundary',
'set_charset', 'set_default_type', 'set_param', 'set_payload',
'set_type', 'set_unixfrom', 'values', 'walk'] I know that getting rid of mimetools was desired, but I have no idea if I ran into this problem with code that was trying to properly determine I haven't checked whether 2to3 deals with this or not, but it might be |
Verified that 2to3 does not fix this. |
ISTM that these issues tend to go all the way up to test coverage and |
No deep thought was given to the HTTPMessage API. Here's the extent of A: mimetools.Message is compatible with email.message.Message, right? |
A plausible solution is to pick some core set of functionality that we |
I propose that you only document the getitem header access API. I.e. |
On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote:
As I've found myself in the awkward position of having to explain the new Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable I wish I new what an addinfourl object was. 'Fri, 27 Mar 2009 00:41:34 GMT' 'Fri, 27 Mar 2009 00:41:34 GMT' ['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges', Using x.headers over x.info() makes the most sense to me, but I don't know 'text/html; charset=ISO-8859-1' I guess technically this is correct since the charset is part of the As you follow this road, you have the confusing option of these three calls: 'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1'] I think it should be a bug that get_charset() does not return anything in Brad
|
The attached file is vaguely related to the current discussion. I'd Apologies for the noise in this patch. I was on a plane, and I don't |
I spent sometime on the patch which replaces the self.msg usage with |
There is a difference in what HTTPResponse.getheaders() returns. Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> c = httplib.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('content-length', '0'), ('set-cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:54:15 GMT, test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:54:15 GMT'), ('vary', 'Accept-Encoding'), ('server', 'Apache'), ('date', 'Fri, 02 Mar 2012 16:53:15 GMT'), ('content-type', 'text/html')]
Python 3.2.2 (default, Sep 4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from http import client
>>> c = client.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('Date', 'Fri, 02 Mar 2012 16:56:40 GMT'), ('Server', 'Apache'), ('Set-Cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Set-Cookie', 'test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Vary', 'Accept-Encoding'), ('Content-Length', '0'), ('Content-Type', 'text/html')] As you can see, in 2.7.2 HTTPResponse.getheaders() in 2.7.2 joins headers with the same name by ", ". In 3.2.2, the headers are kept separate and two or more 2-tuples. This causes problems if you convert the list of 2-tuples to a dict, because the keys collide (causing all but one of the values associated the non-unique keys to be overwritten). It looks like this problem is caused by using the email header parser (which keeps the keys and values as separate 2-tuples). In Python 2.7.2, the HTTPMessage.addheader(...) function does the comma-separating. Is this API change intentional? Should HTTPResponse.getheaders() comma-separate the values like the HTTPResponse.getheader(...) function (in both 2.7.2 and 3.2.2)? See also: |
Now that two Python 3 releases have been made, I don’t know if changing the code is still an option. The doc can certainly still be improved. Adding Ezio to nosy; I think it’s you who opened a bug report about removing superfluous getter methods in the addinfourl class (and other ugliness). |
Yep, bpo-12707. |
@joel.verhagen "Should HTTPResponse.getheaders() comma-separate the values (...)" No, it should not. RFC 2616 states: "Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]." As field-values for some header fields ('Set-Cookie' being an example) are not defined as a comma-separated list such fields must not be merged. Side note: |
...continuing my previous comment Joining headers with the same name by ", " by HTTPResponse.getheaders() in Python 2.7 is wrong and there's a bug for this - see http://bugs.python.org/issue1660009 |
I just caught a bug because on Python 3 |
Jeremy’s patch appears to have been merged in revision 9eceb618274a. A documentation entry for the HTTPMessage class was also added in 2009, pointing back to email.message.Message. So is there anything left to do for this issue? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: