classification
Title: HTTPMessage not documented and has inconsistent API across 2.6/3.0
Type: behavior Stage: test needed
Components: Documentation, Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: open Resolution: accepted
Dependencies: httplib.HTTPMessage undocumented
View: 3428
Superseder:
Assigned To: jhylton Nosy List: ajaksu2, barry, beazley, bmiller, eric.araujo, georg.brandl, jhylton, jjlee, orsenthil, petri.lehtinen, srid
Priority: normal Keywords: patch

Created on 2008-12-29 21:42 by beazley, last changed 2011-11-23 19:38 by petri.lehtinen.

Files
File name Uploaded Description Edit
unnamed bmiller, 2009-03-27 01:00
addinfourl_removal.diff jhylton, 2009-03-30 16:32 review
Messages (9)
msg78486 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 21:42
A file-like object u returned by the urlopen() function in both Python 
2.6/3.0 has a method info() that returns a 'HTTPMessage' object.  For 
example:

::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>> 

::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>>

So far, so good.  HTTPMessage is defined in two different modules, but 
that's fine (it's just library reorganization).

Two major problems:

1. There is no documentation whatsoever on HTTPMessage.  No description 
in the docs for httplib (python 2.6) or http.client (python 3.0).

2. The HTTPMessage object in Python 2.6 derives from mimetools.Message 
and has a totally different programming interface than HTTPMessage in 
Python 3.0 which derives from email.message.Message.  Check it out:

:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', 
'__iter__', '__len__', '__module__', '__setitem__', '__str__', 
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get', 
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate', 
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader', 
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist', 
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers', 
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype', 
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders', 
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders', 
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']

:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', 
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', 
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', 
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', 
'__str__', '__subclasshook__', '__weakref__', '_charset', 
'_default_type', '_get_params_preserve', '_headers', '_payload', 
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects', 
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary', 
'get_charset', 'get_charsets', 'get_content_charset', 
'get_content_maintype', 'get_content_subtype', 'get_content_type', 
'get_default_type', 'get_filename', 'get_param', 'get_params', 
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart', 
'items', 'keys', 'preamble', 'replace_header', 'set_boundary', 
'set_charset', 'set_default_type', 'set_param', 'set_payload', 
'set_type', 'set_unixfrom', 'values', 'walk']

I know that getting rid of mimetools was desired, but I have no idea if 
changing the API on HTTPMessage was intended or not.  In any case, it's 
one of the only cases in the entire library where the programming 
interface to an object radically changes from 2.6 -> 3.0.  

I ran into this problem with code that was trying to properly determine 
the charset encoding of the byte string returned by urlopen(). 

I haven't checked whether 2to3 deals with this or not, but it might be 
something for someone to look at in their copious amounts of spare time.
msg78491 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 22:20
Verified that 2to3 does not fix this.
msg81799 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-12 18:44
ISTM that these issues tend to go all the way up to test coverage and
organization :/
msg84217 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-26 21:03
No deep thought was given to the HTTPMessage API.  Here's the extent of
the discussion that I can find.  I've changed the names, but you can
find the full discussion at http://bugs.python.org/issue2848

A: mimetools.Message is compatible with email.message.Message, right?
B: I don't know how compatible it is.
C: The APIs are bit different, but it should be possible to migrate from 
the old to the new.
msg84219 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-26 21:04
A plausible solution is to pick some core set of functionality that we
think people need and document that API.  We can modify one or both of
the current implementations to include that functionality.  What do we need?
msg84223 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-03-26 21:29
I propose that you only document the getitem header access API.  I.e.
the thing that info() gives you can be used to access the message
headers via message['content-type'].  That's an API common to both
rfc822.Messages (the ultimate base class of mimetools.Message) and
email.message.Message.
msg84243 - (view) Author: Brad Miller (bmiller) Date: 2009-03-27 01:00
On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote:

>
> Barry A. Warsaw <barry@python.org> added the comment:
>
> I propose that you only document the getitem header access API.  I.e.
> the thing that info() gives you can be used to access the message
> headers via message['content-type'].  That's an API common to both
> rfc822.Messages (the ultimate base class of mimetools.Message) and
> email.message.Message.
>

As I've found myself in the awkward position of having to explain the new
3.0 api to my students I've thought about this and have some
ideas/questions.
I'm also willing to help with the documentation or any enhancements.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable

I wish I new what an addinfourl object was.

'Fri, 27 Mar 2009 00:41:34 GMT'

'Fri, 27 Mar 2009 00:41:34 GMT'

['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges',
'Content-Length', 'Connection', 'Content-Type']

Using x.headers over x.info()  makes the most sense to me, but I don't know
that I can give any good rationale.  Which would we want to document?

'text/html; charset=ISO-8859-1'

I guess technically this is correct since the charset is part of the
Content-Type header in HTTP but it does make life difficult for what I think
will be a pretty common use case in this new urllib:  read from the url (as
bytes) and then decode them into a string using the appropriate character
set.

As you follow this road, you have the confusing option of these three calls:

'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1']

I think it should be a bug that get_charset() does not return anything in
this case.  It is not at all clear why get_content_charset() and
get_charset() should have different behavior.

Brad

>
> ----------
> nosy: +barry
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue4773>
> _______________________________________
>
msg84577 - (view) Author: Jeremy Hylton (jhylton) Date: 2009-03-30 16:32
The attached file is vaguely related to the current discussion.  I'd
like to document the API for the urllib response, but I'd also like to
simplify the implementation on the py3k side.  We can document the
simple API on the py3k side, then support some version of that API on
the py2k side.

Apologies for the noise in this patch.  I was on a plane, and I don't
understand DVCS yet.
msg84783 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009-03-31 14:09
I spent sometime on the patch which replaces the self.msg usage with
self.headers in http.client. Everything is fine.
The next step is to provide an interface in the urllib.response and the
equivalent changes to py2k.
History
Date User Action Args
2011-11-23 19:38:02petri.lehtinensetnosy: + petri.lehtinen
2010-02-16 05:31:33eric.araujosetnosy: + eric.araujo
2009-09-09 19:51:33sridsetnosy: + srid
2009-03-31 14:09:18orsenthilsetassignee: georg.brandl -> jhylton
resolution: accepted
messages: + msg84783
2009-03-30 16:32:51jhyltonsetfiles: + addinfourl_removal.diff
keywords: + patch
messages: + msg84577
2009-03-27 01:00:26bmillersetfiles: + unnamed

messages: + msg84243
2009-03-26 21:29:57barrysetnosy: + barry
messages: + msg84223
2009-03-26 21:04:35jhyltonsetmessages: + msg84219
2009-03-26 21:03:46jhyltonsetnosy: + jhylton
messages: + msg84217
2009-03-18 15:43:57bmillersetnosy: + bmiller
2009-02-13 01:47:44ajaksu2setnosy: + jjlee
2009-02-12 19:03:48ajaksu2setdependencies: + httplib.HTTPMessage undocumented
2009-02-12 18:44:49ajaksu2setassignee: georg.brandl
messages: + msg81799
nosy: + ajaksu2, georg.brandl, orsenthil
components: + Documentation
stage: test needed
2008-12-29 22:20:27beazleysetmessages: + msg78491
2008-12-29 21:42:31beazleycreate