Issue4773
Created on 2008-12-29 21:42 by beazley, last changed 2013-03-28 17:11 by hltbra.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| addinfourl_removal.diff | jhylton, 2009-03-30 16:32 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg78486 - (view) | Author: David M. Beazley (beazley) | Date: 2008-12-29 21:42 | |
A file-like object u returned by the urlopen() function in both Python
2.6/3.0 has a method info() that returns a 'HTTPMessage' object. For
example:
::: Python 2.6
>>> from urllib2 import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<httplib.HTTPMessage instance at 0xce5738>
>>>
::: Python 3.0
>>> from urllib.request import urlopen
>>> u = urlopen("http://www.python.org")
>>> u.info()
<http.client.HTTPMessage object at 0x4bfa10>
>>>
So far, so good. HTTPMessage is defined in two different modules, but
that's fine (it's just library reorganization).
Two major problems:
1. There is no documentation whatsoever on HTTPMessage. No description
in the docs for httplib (python 2.6) or http.client (python 3.0).
2. The HTTPMessage object in Python 2.6 derives from mimetools.Message
and has a totally different programming interface than HTTPMessage in
Python 3.0 which derives from email.message.Message. Check it out:
:::Python 2.6
>>> dir(u.info())
['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__',
'__iter__', '__len__', '__module__', '__setitem__', '__str__',
'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get',
'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate',
'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader',
'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist',
'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers',
'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype',
'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders',
'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders',
'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values']
:::Python 3.0
>>> dir(u.info())
['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', '_charset',
'_default_type', '_get_params_preserve', '_headers', '_payload',
'_unixfrom', 'add_header', 'as_string', 'attach', 'defects',
'del_param', 'epilogue', 'get', 'get_all', 'get_boundary',
'get_charset', 'get_charsets', 'get_content_charset',
'get_content_maintype', 'get_content_subtype', 'get_content_type',
'get_default_type', 'get_filename', 'get_param', 'get_params',
'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart',
'items', 'keys', 'preamble', 'replace_header', 'set_boundary',
'set_charset', 'set_default_type', 'set_param', 'set_payload',
'set_type', 'set_unixfrom', 'values', 'walk']
I know that getting rid of mimetools was desired, but I have no idea if
changing the API on HTTPMessage was intended or not. In any case, it's
one of the only cases in the entire library where the programming
interface to an object radically changes from 2.6 -> 3.0.
I ran into this problem with code that was trying to properly determine
the charset encoding of the byte string returned by urlopen().
I haven't checked whether 2to3 deals with this or not, but it might be
something for someone to look at in their copious amounts of spare time.
|
|||
| msg78491 - (view) | Author: David M. Beazley (beazley) | Date: 2008-12-29 22:20 | |
Verified that 2to3 does not fix this. |
|||
| msg81799 - (view) | Author: Daniel Diniz (ajaksu2) | Date: 2009-02-12 18:44 | |
ISTM that these issues tend to go all the way up to test coverage and organization :/ |
|||
| msg84217 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2009-03-26 21:03 | |
No deep thought was given to the HTTPMessage API. Here's the extent of the discussion that I can find. I've changed the names, but you can find the full discussion at http://bugs.python.org/issue2848 A: mimetools.Message is compatible with email.message.Message, right? B: I don't know how compatible it is. C: The APIs are bit different, but it should be possible to migrate from the old to the new. |
|||
| msg84219 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2009-03-26 21:04 | |
A plausible solution is to pick some core set of functionality that we think people need and document that API. We can modify one or both of the current implementations to include that functionality. What do we need? |
|||
| msg84223 - (view) | Author: Barry A. Warsaw (barry) * ![]() |
Date: 2009-03-26 21:29 | |
I propose that you only document the getitem header access API. I.e. the thing that info() gives you can be used to access the message headers via message['content-type']. That's an API common to both rfc822.Messages (the ultimate base class of mimetools.Message) and email.message.Message. |
|||
| msg84243 - (view) | Author: Brad Miller (bmiller) | Date: 2009-03-27 01:00 | |
On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report@bugs.python.org>wrote: > > Barry A. Warsaw <barry@python.org> added the comment: > > I propose that you only document the getitem header access API. I.e. > the thing that info() gives you can be used to access the message > headers via message['content-type']. That's an API common to both > rfc822.Messages (the ultimate base class of mimetools.Message) and > email.message.Message. > As I've found myself in the awkward position of having to explain the new 3.0 api to my students I've thought about this and have some ideas/questions. I'm also willing to help with the documentation or any enhancements. Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'addinfourl' object is unsubscriptable I wish I new what an addinfourl object was. 'Fri, 27 Mar 2009 00:41:34 GMT' 'Fri, 27 Mar 2009 00:41:34 GMT' ['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges', 'Content-Length', 'Connection', 'Content-Type'] Using x.headers over x.info() makes the most sense to me, but I don't know that I can give any good rationale. Which would we want to document? 'text/html; charset=ISO-8859-1' I guess technically this is correct since the charset is part of the Content-Type header in HTTP but it does make life difficult for what I think will be a pretty common use case in this new urllib: read from the url (as bytes) and then decode them into a string using the appropriate character set. As you follow this road, you have the confusing option of these three calls: 'iso-8859-1' >>> x.headers.get_charsets() ['iso-8859-1'] I think it should be a bug that get_charset() does not return anything in this case. It is not at all clear why get_content_charset() and get_charset() should have different behavior. Brad > > ---------- > nosy: +barry > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4773> > _______________________________________ > |
|||
| msg84577 - (view) | Author: Jeremy Hylton (jhylton) | Date: 2009-03-30 16:32 | |
The attached file is vaguely related to the current discussion. I'd like to document the API for the urllib response, but I'd also like to simplify the implementation on the py3k side. We can document the simple API on the py3k side, then support some version of that API on the py2k side. Apologies for the noise in this patch. I was on a plane, and I don't understand DVCS yet. |
|||
| msg84783 - (view) | Author: Senthil Kumaran (orsenthil) * ![]() |
Date: 2009-03-31 14:09 | |
I spent sometime on the patch which replaces the self.msg usage with self.headers in http.client. Everything is fine. The next step is to provide an interface in the urllib.response and the equivalent changes to py2k. |
|||
| msg154781 - (view) | Author: Joel Verhagen (joel.verhagen) | Date: 2012-03-02 17:05 | |
There is a difference in what HTTPResponse.getheaders() returns.
Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import httplib
>>> c = httplib.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('content-length', '0'), ('set-cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:54:15 GMT, test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:54:15 GMT'), ('vary', 'Accept-Encoding'), ('server', 'Apache'), ('date', 'Fri, 02 Mar 2012 16:53:15 GMT'), ('content-type', 'text/html')]
Python 3.2.2 (default, Sep 4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from http import client
>>> c = client.HTTPConnection('www.joelverhagen.com')
>>> c.request('GET', '/sandbox/tests/cookies.php')
>>> c.getresponse().getheaders()
[('Date', 'Fri, 02 Mar 2012 16:56:40 GMT'), ('Server', 'Apache'), ('Set-Cookie', 'test_cookie1=foobar; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Set-Cookie', 'test_cookie2=barfoo; expires=Fri, 02-Mar-2012 16:57:40 GMT'), ('Vary', 'Accept-Encoding'), ('Content-Length', '0'), ('Content-Type', 'text/html')]
As you can see, in 2.7.2 HTTPResponse.getheaders() in 2.7.2 joins headers with the same name by ", ". In 3.2.2, the headers are kept separate and two or more 2-tuples.
This causes problems if you convert the list of 2-tuples to a dict, because the keys collide (causing all but one of the values associated the non-unique keys to be overwritten). It looks like this problem is caused by using the email header parser (which keeps the keys and values as separate 2-tuples). In Python 2.7.2, the HTTPMessage.addheader(...) function does the comma-separating.
Is this API change intentional? Should HTTPResponse.getheaders() comma-separate the values like the HTTPResponse.getheader(...) function (in both 2.7.2 and 3.2.2)?
See also:
https://github.com/shazow/urllib3/issues/3#issuecomment-3008415
|
|||
| msg154835 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2012-03-03 13:18 | |
Now that two Python 3 releases have been made, I don’t know if changing the code is still an option. The doc can certainly still be improved. Adding Ezio to nosy; I think it’s you who opened a bug report about removing superfluous getter methods in the addinfourl class (and other ugliness). |
|||
| msg154850 - (view) | Author: Ezio Melotti (ezio.melotti) * ![]() |
Date: 2012-03-03 20:09 | |
Yep, #12707. |
|||
| msg179377 - (view) | Author: Piotr Dobrogost (piotr.dobrogost) | Date: 2013-01-08 22:02 | |
@joel.verhagen
"Should HTTPResponse.getheaders() comma-separate the values (...)"
No, it should not. RFC 2616 states:
"Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]."
As field-values for some header fields ('Set-Cookie' being an example) are not defined as a comma-separated list such fields must not be merged.
Side note:
RFC 2616 is very soon to be obsoleted by the new RFC from httpbin working group. However, in the current/newest draft (http://trac.tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-21#section-3.2) although wording is different the sense is the same.
|
|||
| msg179379 - (view) | Author: Piotr Dobrogost (piotr.dobrogost) | Date: 2013-01-08 22:07 | |
...continuing my previous comment Joining headers with the same name by ", " by HTTPResponse.getheaders() in Python 2.7 is wrong and there's a bug for this - see http://bugs.python.org/issue1660009 |
|||
| msg185458 - (view) | Author: Hugo Lopes Tavares (hltbra) | Date: 2013-03-28 17:11 | |
I just caught a bug because on Python 3 `HTTPMessage` has `get_param`, while on Python 2 there is `getparam`, with a different method signature. I am trying to figure out a solution so my code can run in both python 2 and 3 without ifs on python version. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2013-03-28 17:11:21 | hltbra | set | nosy:
+ hltbra messages: + msg185458 |
| 2013-01-08 22:07:40 | piotr.dobrogost | set | messages: + msg179379 |
| 2013-01-08 22:02:10 | piotr.dobrogost | set | nosy:
+ piotr.dobrogost messages: + msg179377 |
| 2012-03-03 20:09:37 | ezio.melotti | set | messages: + msg154850 |
| 2012-03-03 13:18:59 | eric.araujo | set | files: - unnamed |
| 2012-03-03 13:18:44 | eric.araujo | set | versions:
+ Python 2.7, Python 3.2, Python 3.3, - Python 2.6, Python 3.0 nosy: + ezio.melotti title: HTTPMessage not documented and has inconsistent API across 2.6/3.0 -> HTTPMessage not documented and has inconsistent API across Py2/Py3 messages: + msg154835 resolution: accepted -> stage: test needed -> patch review |
| 2012-03-02 17:05:27 | joel.verhagen | set | nosy:
+ joel.verhagen messages: + msg154781 |
| 2011-11-23 19:38:02 | petri.lehtinen | set | nosy:
+ petri.lehtinen |
| 2010-02-16 05:31:33 | eric.araujo | set | nosy:
+ eric.araujo |
| 2009-09-09 19:51:33 | srid | set | nosy:
+ srid |
| 2009-03-31 14:09:18 | orsenthil | set | assignee: georg.brandl -> jhylton resolution: accepted messages: + msg84783 |
| 2009-03-30 16:32:51 | jhylton | set | files:
+ addinfourl_removal.diff keywords: + patch messages: + msg84577 |
| 2009-03-27 01:00:26 | bmiller | set | files:
+ unnamed messages: + msg84243 |
| 2009-03-26 21:29:57 | barry | set | nosy:
+ barry messages: + msg84223 |
| 2009-03-26 21:04:35 | jhylton | set | messages: + msg84219 |
| 2009-03-26 21:03:46 | jhylton | set | nosy:
+ jhylton messages: + msg84217 |
| 2009-03-18 15:43:57 | bmiller | set | nosy:
+ bmiller |
| 2009-02-13 01:47:44 | ajaksu2 | set | nosy: + jjlee |
| 2009-02-12 19:03:48 | ajaksu2 | set | dependencies: + httplib.HTTPMessage undocumented |
| 2009-02-12 18:44:49 | ajaksu2 | set | assignee: georg.brandl messages: + msg81799 nosy: + ajaksu2, georg.brandl, orsenthil components: + Documentation stage: test needed |
| 2008-12-29 22:20:27 | beazley | set | messages: + msg78491 |
| 2008-12-29 21:42:31 | beazley | create | |
