A file-like object u returned by the urlopen() function in both Python
2.6/3.0 has a method info() that returns a 'HTTPMessage' object. For
example:
::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>>
::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>>
So far, so good. HTTPMessage is defined in two different modules, but
that's fine (it's just library reorganization).
Two major problems:
1. There is no documentation whatsoever on HTTPMessage. No description
in the docs for httplib (python 2.6) or http.client (python 3.0).
2. The HTTPMessage object in Python 2.6 derives from mimetools.Message
and has a totally different programming interface than HTTPMessage in
Python 3.0 which derives from email.message.Message. Check it out:
:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__',
'__iter__', '__len__', '__module__', '__setitem__', '__str__',
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get',
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate',
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader',
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist',
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers',
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype',
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders',
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders',
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']
:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', '_charset',
'_default_type', '_get_params_preserve', '_headers', '_payload',
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects',
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary',
'get_charset', 'get_charsets', 'get_content_charset',
'get_content_maintype', 'get_content_subtype', 'get_content_type',
'get_default_type', 'get_filename', 'get_param', 'get_params',
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart',
'items', 'keys', 'preamble', 'replace_header', 'set_boundary',
'set_charset', 'set_default_type', 'set_param', 'set_payload',
'set_type', 'set_unixfrom', 'values', 'walk']
I know that getting rid of mimetools was desired, but I have no idea if
changing the API on HTTPMessage was intended or not. In any case, it's
one of the only cases in the entire library where the programming
interface to an object radically changes from 2.6 -> 3.0.
I ran into this problem with code that was trying to properly determine
the charset encoding of the byte string returned by urlopen().
I haven't checked whether 2to3 deals with this or not, but it might be
something for someone to look at in their copious amounts of spare time. |