classification
Title: HTTPMessage not documented and has inconsistent API across Py2/Py3
Type: behavior Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: 3428 Superseder:
Assigned To: jhylton Nosy List: ajaksu2, barry, beazley, bmiller, eric.araujo, ezio.melotti, georg.brandl, hltbra, jhylton, jjlee, joel.verhagen, orsenthil, petri.lehtinen, piotr.dobrogost, srid
Priority: normal Keywords: patch

Created on 2008-12-29 21:42 by beazley, last changed 2013-03-28 17:11 by hltbra.

Files
File name Uploaded Description Edit
addinfourl_removal.diff jhylton, 2009-03-30 16:32 review
Messages (15)
msg78486 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 21:42
A file-like object u returned by the urlopen() function in both Python 
2.6/3.0 has a method info() that returns a 'HTTPMessage' object.  For 
example:

::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>> 

::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>>

So far, so good.  HTTPMessage is defined in two different modules, but 
that's fine (it's just library reorganization).

Two major problems:

1. There is no documentation whatsoever on HTTPMessage.  No description 
in the docs for httplib (python 2.6) or http.client (python 3.0).

2. The HTTPMessage object in Python 2.6 derives from mimetools.Message 
and has a totally different programming interface than HTTPMessage in 
Python 3.0 which derives from email.message.Message.  Check it out:

:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', 
'__iter__', '__len__', '__module__', '__setitem__', '__str__', 
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get', 
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate', 
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader', 
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist', 
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers', 
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype', 
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders', 
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders', 
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']

:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', 
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', 
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', 
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', 
'__str__', '__subclasshook__', '__weakref__', '_charset', 
'_default_type', '_get_params_preserve', '_headers', '_payload', 
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects', 
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary', 
'get_charset', 'get_charsets', 'get_content_charset', 
'get_content_maintype', 'get_content_subtype', 'get_content_type', 
'get_default_type', 'get_filename', 'get_param', 'get_params', 
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart', 
'items', 'keys', 'preamble', 'replace_header', 'set_boundary', 
'set_charset', 'set_default_type', 'set_param', 'set_payload', 
'set_type', 'set_unixfrom', 'values', 'walk']

I know that getting rid of mimetools was desired, but I have no idea if 
changing the API on HTTPMessage was intended or not.  In any case, it's 
one of the only cases in the entire library where the programming 
interface to an object radically changes from 2.6 -> 3.0.  

I ran into this problem with code that was trying to properly determine 
the charset encoding of the byte string returned by urlopen(). 

I haven't checked whether 2to3 deals with this or not, but it might be 
something for someone to look at in their copious amounts of spare time.
msg78491 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 22:20
Verified that 2to3 does not fix this.
msg81799 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-12 18:44
ISTM that these issues tend to go all the way up to test coverage and
organization :/
msg84217 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-26 21:03
No deep thought was given to the HTTPMessage API.  Here's the extent of
the discussion that I can find.  I've changed the names, but you can
find the full discussion at http://bugs.python.org/issue2848

A: mimetools.Message is compatible with email.message.Message, right?
B: I don't know how compatible it is.
C: The APIs are bit different, but it should be possible to migrate from 
the old to the new.
msg84219 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-26 21:04
A plausible solution is to pick some core set of functionality that we
think people need and document that API.  We can modify one or both of
the current implementations to include that functionality.  What do we need?
msg84223 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-03-26 21:29
I propose that you only document the getitem header access API.  I.e.
the thing that info() gives you can be used to access the message
headers via message['content-type'].  That's an API common to both
rfc822.Messages (the ultimate base class of mimetools.Message) and
email.message.Message.
msg84243 - (view) Author: Brad Miller (bmiller) Date: 2009-03-27 01:00
On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote:

>
> Barry A. Warsaw <barry@python.org> added the comment:
>
> I propose that you only document the getitem header access API.  I.e.
> the thing that info() gives you can be used to access the message
> headers via message['content-type'].  That's an API common to both
> rfc822.Messages (the ultimate base class of mimetools.Message) and
> email.message.Message.
>

As I've found myself in the awkward position of having to explain the new
3.0 api to my students I've thought about this and have some
ideas/questions.
I'm also willing to help with the documentation or any enhancements.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable

I wish I new what an addinfourl object was.

'Fri, 27 Mar 2009 00:41:34 GMT'

'Fri, 27 Mar 2009 00:41:34 GMT'

['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges',
'Content-Length', 'Connection', 'Content-Type']

Using x.headers over x.info()  makes the most sense to me, but I don't know
that I can give any good rationale.  Which would we want to document?

'text/html; charset=ISO-8859-1'

I guess technically this is correct since the charset is part of the
Content-Type header in HTTP but it does make life difficult for what I think
will be a pretty common use case in this new urllib:  read from the url (as
bytes) and then decode them into a string using the appropriate character
set.

As you follow this road, you have the confusing option of these three calls:

'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1']

I think it should be a bug that get_charset() does not return anything in
this case.  It is not at all clear why get_content_charset() and
get_charset() should have different behavior.

Brad

>
> ----------
> nosy: +barry
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4773>
> _______________________________________
>
msg84577 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-30 16:32
The attached file is vaguely related to the current discussion.  I'd
like to document the API for the urllib response, but I'd also like to
simplify the implementation on the py3k side.  We can document the
simple API on the py3k side, then support some version of that API on
the py2k side.

Apologies for the noise in this patch.  I was on a plane, and I don't
understand DVCS yet.
msg84783 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-03-31 14:09
I spent sometime on the patch which replaces the self.msg usage with
self.headers in http.client. Everything is fine.
The next step is to provide an interface in the urllib.response and the
equivalent changes to py2k.
msg154781 - (view) Author: Joel Verhagen (joel.verhagen) Date: 2012-03-02 17:05
There is a difference in what HTTPResponse.getheaders() returns.

Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> c = httplib.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('content-length', '0'), ('set-cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:54:15 GMT, test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:54:15 GMT'), ('vary', 'Accept-Encoding'), ('server', 'Apache'), ('date', 'Fri, 02 Mar 2012 16:53:15 GMT'), ('content-type', 'text/html')]

Python 3.2.2 (default, Sep  4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from http import client
>>> c = client.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('Date', 'Fri, 02 Mar 2012 16:56:40 GMT'), ('Server', 'Apache'), ('Set-Cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Set-Cookie', 'test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Vary', 'Accept-Encoding'), ('Content-Length', '0'), ('Content-Type', 'text/html')]

As you can see, in 2.7.2 HTTPResponse.getheaders() in 2.7.2 joins headers with the same name by ", ". In 3.2.2, the headers are kept separate and two or more 2-tuples.

This causes problems if you convert the list of 2-tuples to a dict, because the keys collide (causing all but one of the values associated the non-unique keys to be overwritten).  It looks like this problem is caused by using the email header parser (which keeps the keys and values as separate 2-tuples). In Python 2.7.2, the HTTPMessage.addheader(...) function does the comma-separating.

Is this API change intentional? Should HTTPResponse.getheaders() comma-separate the values like the HTTPResponse.getheader(...) function (in both 2.7.2 and 3.2.2)?

See also:
https://github.com/shazow/urllib3/issues/3#issuecomment-3008415
msg154835 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-03 13:18
Now that two Python 3 releases have been made, I don’t know if changing the code is still an option.  The doc can certainly still be improved.

Adding Ezio to nosy; I think it’s you who opened a bug report about removing superfluous getter methods in the addinfourl class (and other ugliness).
msg154850 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-03-03 20:09
Yep, #12707.
msg179377 - (view) Author: Piotr Dobrogost (piotr.dobrogost) Date: 2013-01-08 22:02
@joel.verhagen

"Should HTTPResponse.getheaders() comma-separate the values (...)"

No, it should not. RFC 2616 states:

"Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]."

As field-values for some header fields ('Set-Cookie' being an example) are not defined as a comma-separated list such fields must not be merged.

Side note:
RFC 2616 is very soon to be obsoleted by the new RFC from httpbin working group. However, in the current/newest draft (http://trac.tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-21#section-3.2) although wording is different the sense is the same.
msg179379 - (view) Author: Piotr Dobrogost (piotr.dobrogost) Date: 2013-01-08 22:07
...continuing my previous comment

Joining headers with the same name by ", " by HTTPResponse.getheaders() in Python 2.7 is wrong and there's a bug for this - see http://bugs.python.org/issue1660009
msg185458 - (view) Author: Hugo Lopes Tavares (hltbra) Date: 2013-03-28 17:11
I just caught a bug because on Python 3 `HTTPMessage` has `get_param`, while on Python 2 there is `getparam`, with a different method signature. I am trying to figure out a solution so my code can run in both python 2 and 3 without ifs on python version.
History
Date User Action Args
2013-03-28 17:11:21hltbrasetnosy: + hltbra
messages: + msg185458
2013-01-08 22:07:40piotr.dobrogostsetmessages: + msg179379
2013-01-08 22:02:10piotr.dobrogostsetnosy: + piotr.dobrogost
messages: + msg179377
2012-03-03 20:09:37ezio.melottisetmessages: + msg154850
2012-03-03 13:18:59eric.araujosetfiles: - unnamed
2012-03-03 13:18:44eric.araujosetversions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6, Python 3.0

nosy: + ezio.melotti
title: HTTPMessage not documented and has inconsistent API across 2.6/3.0 -> HTTPMessage not documented and has inconsistent API across Py2/Py3
messages: + msg154835
resolution: accepted ->
stage: test needed -> patch review
2012-03-02 17:05:27joel.verhagensetnosy: + joel.verhagen
messages: + msg154781
2011-11-23 19:38:02petri.lehtinensetnosy: + petri.lehtinen
2010-02-16 05:31:33eric.araujosetnosy: + eric.araujo
2009-09-09 19:51:33sridsetnosy: + srid
2009-03-31 14:09:18orsenthilsetassignee: georg.brandl -> jhylton
resolution: accepted
messages: + msg84783
2009-03-30 16:32:51jhyltonsetfiles: + addinfourl_removal.diff
keywords: + patch
messages: + msg84577
2009-03-27 01:00:26bmillersetfiles: + unnamed

messages: + msg84243
2009-03-26 21:29:57barrysetnosy: + barry
messages: + msg84223
2009-03-26 21:04:35jhyltonsetmessages: + msg84219
2009-03-26 21:03:46jhyltonsetnosy: + jhylton
messages: + msg84217
2009-03-18 15:43:57bmillersetnosy: + bmiller
2009-02-13 01:47:44ajaksu2setnosy: + jjlee
2009-02-12 19:03:48ajaksu2setdependencies: + httplib.HTTPMessage undocumented
2009-02-12 18:44:49ajaksu2setassignee: georg.brandl
messages: + msg81799
nosy: + ajaksu2, georg.brandl, orsenthil
components: + Documentation
stage: test needed
2008-12-29 22:20:27beazleysetmessages: + msg78491
2008-12-29 21:42:31beazleycreate