HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

beazley · 2008-12-29T21:42:32Z

BPO	4773
Nosy	@warsaw, @birkenfeld, @orsenthil, @devdanzin, @ezio-melotti, @merwok, @akheron, @vadmium
Dependencies	bpo-3428: httplib.HTTPMessage undocumented
Files	addinfourl_removal.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-02-14.18:34:37.829>
created_at = <Date 2008-12-29.21:42:31.884>
labels = ['type-bug', 'library', 'docs']
title = 'HTTPMessage not documented and has inconsistent API across Py2/Py3'
updated_at = <Date 2017-04-28.09:20:24.183>
user = 'https://bugs.python.org/beazley'

bugs.python.org fields:

activity = <Date 2017-04-28.09:20:24.183>
actor = 'Socob'
assignee = 'jhylton'
closed = True
closed_date = <Date 2015-02-14.18:34:37.829>
closer = 'berker.peksag'
components = ['Documentation', 'Library (Lib)']
creation = <Date 2008-12-29.21:42:31.884>
creator = 'beazley'
dependencies = ['3428']
files = ['13473']
hgrepos = []
issue_num = 4773
keywords = ['patch']
message_count = 16.0
messages = ['78486', '78491', '81799', '84217', '84219', '84223', '84243', '84577', '84783', '154781', '154835', '154850', '179377', '179379', '185458', '235603']
nosy_count = 17.0
nosy_names = ['jhylton', 'barry', 'georg.brandl', 'beazley', 'jjlee', 'orsenthil', 'ajaksu2', 'bmiller', 'ezio.melotti', 'eric.araujo', 'srid', 'petri.lehtinen', 'martin.panter', 'piotr.dobrogost', 'joel.verhagen', 'hltbra', 'Socob']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue4773'
versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

beazley · 2008-12-29T21:42:29Z

A file-like object u returned by the urlopen() function in both Python
2.6/3.0 has a method info() that returns a 'HTTPMessage' object. For
example:

::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>> 

::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>>

So far, so good. HTTPMessage is defined in two different modules, but
that's fine (it's just library reorganization).

Two major problems:

There is no documentation whatsoever on HTTPMessage. No description
in the docs for httplib (python 2.6) or http.client (python 3.0).
The HTTPMessage object in Python 2.6 derives from mimetools.Message
and has a totally different programming interface than HTTPMessage in
Python 3.0 which derives from email.message.Message. Check it out:

:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', 
'__iter__', '__len__', '__module__', '__setitem__', '__str__', 
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get', 
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate', 
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader', 
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist', 
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers', 
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype', 
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders', 
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders', 
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']

:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', 
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', 
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', 
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', 
'__str__', '__subclasshook__', '__weakref__', '_charset', 
'_default_type', '_get_params_preserve', '_headers', '_payload', 
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects', 
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary', 
'get_charset', 'get_charsets', 'get_content_charset', 
'get_content_maintype', 'get_content_subtype', 'get_content_type', 
'get_default_type', 'get_filename', 'get_param', 'get_params', 
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart', 
'items', 'keys', 'preamble', 'replace_header', 'set_boundary', 
'set_charset', 'set_default_type', 'set_param', 'set_payload', 
'set_type', 'set_unixfrom', 'values', 'walk']

I know that getting rid of mimetools was desired, but I have no idea if
changing the API on HTTPMessage was intended or not. In any case, it's
one of the only cases in the entire library where the programming
interface to an object radically changes from 2.6 -> 3.0.

I ran into this problem with code that was trying to properly determine
the charset encoding of the byte string returned by urlopen().

I haven't checked whether 2to3 deals with this or not, but it might be
something for someone to look at in their copious amounts of spare time.

beazley · 2008-12-29T22:20:27Z

Verified that 2to3 does not fix this.

devdanzin · 2009-02-12T18:44:49Z

ISTM that these issues tend to go all the way up to test coverage and
organization :/

jhylton · 2009-03-26T21:03:45Z

No deep thought was given to the HTTPMessage API. Here's the extent of
the discussion that I can find. I've changed the names, but you can
find the full discussion at http://bugs.python.org/issue2848

A: mimetools.Message is compatible with email.message.Message, right?
B: I don't know how compatible it is.
C: The APIs are bit different, but it should be possible to migrate from
the old to the new.

jhylton · 2009-03-26T21:04:35Z

A plausible solution is to pick some core set of functionality that we
think people need and document that API. We can modify one or both of
the current implementations to include that functionality. What do we need?

warsaw · 2009-03-26T21:29:57Z

I propose that you only document the getitem header access API. I.e.
the thing that info() gives you can be used to access the message
headers via message['content-type']. That's an API common to both
rfc822.Messages (the ultimate base class of mimetools.Message) and
email.message.Message.

bmiller · 2009-03-27T01:00:25Z

On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote:

Barry A. Warsaw <barry@python.org> added the comment:

I propose that you only document the getitem header access API. I.e.
the thing that info() gives you can be used to access the message
headers via message['content-type']. That's an API common to both
rfc822.Messages (the ultimate base class of mimetools.Message) and
email.message.Message.

As I've found myself in the awkward position of having to explain the new
3.0 api to my students I've thought about this and have some
ideas/questions.
I'm also willing to help with the documentation or any enhancements.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable

I wish I new what an addinfourl object was.

'Fri, 27 Mar 2009 00:41:34 GMT'

['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges',
'Content-Length', 'Connection', 'Content-Type']

Using x.headers over x.info() makes the most sense to me, but I don't know
that I can give any good rationale. Which would we want to document?

'text/html; charset=ISO-8859-1'

I guess technically this is correct since the charset is part of the
Content-Type header in HTTP but it does make life difficult for what I think
will be a pretty common use case in this new urllib: read from the url (as
bytes) and then decode them into a string using the appropriate character
set.

As you follow this road, you have the confusing option of these three calls:

'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1']

I think it should be a bug that get_charset() does not return anything in
this case. It is not at all clear why get_content_charset() and
get_charset() should have different behavior.

Brad

----------
nosy: +barry

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue4773\>

jhylton · 2009-03-30T16:32:39Z

The attached file is vaguely related to the current discussion. I'd
like to document the API for the urllib response, but I'd also like to
simplify the implementation on the py3k side. We can document the
simple API on the py3k side, then support some version of that API on
the py2k side.

Apologies for the noise in this patch. I was on a plane, and I don't
understand DVCS yet.

orsenthil · 2009-03-31T14:09:18Z

I spent sometime on the patch which replaces the self.msg usage with
self.headers in http.client. Everything is fine.
The next step is to provide an interface in the urllib.response and the
equivalent changes to py2k.

joelverhagen · 2012-03-02T17:05:26Z

There is a difference in what HTTPResponse.getheaders() returns.

Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> c = httplib.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('content-length', '0'), ('set-cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:54:15 GMT, test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:54:15 GMT'), ('vary', 'Accept-Encoding'), ('server', 'Apache'), ('date', 'Fri, 02 Mar 2012 16:53:15 GMT'), ('content-type', 'text/html')]

Python 3.2.2 (default, Sep  4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from http import client
>>> c = client.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('Date', 'Fri, 02 Mar 2012 16:56:40 GMT'), ('Server', 'Apache'), ('Set-Cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Set-Cookie', 'test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Vary', 'Accept-Encoding'), ('Content-Length', '0'), ('Content-Type', 'text/html')]

As you can see, in 2.7.2 HTTPResponse.getheaders() in 2.7.2 joins headers with the same name by ", ". In 3.2.2, the headers are kept separate and two or more 2-tuples.

This causes problems if you convert the list of 2-tuples to a dict, because the keys collide (causing all but one of the values associated the non-unique keys to be overwritten). It looks like this problem is caused by using the email header parser (which keeps the keys and values as separate 2-tuples). In Python 2.7.2, the HTTPMessage.addheader(...) function does the comma-separating.

Is this API change intentional? Should HTTPResponse.getheaders() comma-separate the values like the HTTPResponse.getheader(...) function (in both 2.7.2 and 3.2.2)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

devdanzin mannequin commented Feb 12, 2009

jhylton mannequin commented Mar 26, 2009

jhylton mannequin commented Mar 26, 2009

warsaw commented Mar 26, 2009

bmiller mannequin commented Mar 27, 2009

jhylton mannequin commented Mar 30, 2009

orsenthil commented Mar 31, 2009

joelverhagen mannequin commented Mar 2, 2012

merwok commented Mar 3, 2012

ezio-melotti commented Mar 3, 2012

piotrdobrogost mannequin commented Jan 8, 2013

piotrdobrogost mannequin commented Jan 8, 2013

hltbra mannequin commented Mar 28, 2013

vadmium commented Feb 9, 2015

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

HTTPMessage not documented and has inconsistent API across Py2/Py3 #49023

Comments

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

devdanzin mannequin commented Feb 12, 2009

jhylton mannequin commented Mar 26, 2009

jhylton mannequin commented Mar 26, 2009

warsaw commented Mar 26, 2009

bmiller mannequin commented Mar 27, 2009

jhylton mannequin commented Mar 30, 2009

orsenthil commented Mar 31, 2009

joelverhagen mannequin commented Mar 2, 2012

merwok commented Mar 3, 2012

ezio-melotti commented Mar 3, 2012

piotrdobrogost mannequin commented Jan 8, 2013

piotrdobrogost mannequin commented Jan 8, 2013

hltbra mannequin commented Mar 28, 2013

vadmium commented Feb 9, 2015