Title: httplib getheader() throws error instead of default
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Walter.Woods, barry, dstanek, eric.araujo, hdiogenes, l0nwlf, orsenthil, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2010-04-29 15:57 by Walter.Woods, last changed 2010-08-02 17:15 by orsenthil. This issue is now closed.

File name Uploaded Description Edit
issue8572.diff orsenthil, 2010-05-01 15:47
issue8572-alt.diff Walter.Woods, 2010-05-05 15:31
issue8572-unfortunate.diff Walter.Woods, 2010-05-05 15:42
8572.diff dstanek, 2010-08-01 03:26
8572-with-docs.diff dstanek, 2010-08-01 05:21
Messages (30)
msg104532 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-04-29 15:57
Calling HTTPResponse.getheader('Testabnew', 'Default') throws an exception rather than returning default, python 3.1.2.  Problem code appears to be line 601 of

    def getheader(self, name, default=None):
        if self.headers is None:
            raise ResponseNotReady()
-        return ', '.join(self.headers.get_all(name, default))

Which should probably changed to:

+        result = self.headers.get_all(name, None)
+        if result:
+            return ', '.join(result)
+        else:
+            return default

Not as elegant, but what appears to happen is if default is not an array, then get_all returns it, and the string join fails with a TypeError.
msg104551 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-29 18:07
For backward compatibility this would have to be something more like:

    if not isinstance(default, list):
        if isinstance(default, string):
           default = [default]
           default = list(default)
    return ', '.join(self.headers_get_all(name, default))

Care to work up a patch, including unit test (it looks like the unit tests live in test_httplib)?
msg104554 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-04-29 18:13
Good point.  Yeah, I could do that within a week or two at the latest.  Just attach a .patch here when done?  And, what should the base be for the patch . . . the Lib/http directory?  Never submitted anything to python before.
msg104557 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-29 18:19
Walter, have a look at document. It is really easy and attach it when its ready.
1) Create a patch against trunk (or py3k).
2) Patch against trunk/ or (py3k/ if its py3k only)
3) Lib/test/ might have similar tests, so build on top of  some existing test.
4) NEWS if applicable.
msg104568 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-04-29 18:57
David: Your example tests specifically for a string, but what about just converting default to a string always if it's not a list?  Otherwise the string join is just going to fail anyway (suppose an integer is passed, which is the case with couchdb-python).  Wouldn't

if not isinstance(default, list):
    default = [str(default)]

Be safer?
msg104585 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-29 20:43
No, because my code is a backward compatibility hack.  Currently if someone is passing a default successfully they must be doing it by passing in a list or tuple consisting of one or more strings.  So your code would result in something like:

  >>> ', '.join([str(['abc'])])

However, you are correct that if default is some object other than a list, it is either going to work in the ', ' or it isn't, so there's no need to call list on it.  So my code should be simplified to:

    if isinstance(default, string):
       default = [default]
    return ', '.join(self.headers_get_all(name, default))
msg104586 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-29 20:52
And also, the code which is using getheader of HTTPResponse object would mostly (but incorrectly in py3k) be passing a string as default. The HTTP headers have been a dict with key,values as strings. Counchdb-python - does this make HTTP reqs? I am not sure if having int-values in the header values is proper.

So, David's last e.g seems just fine. Check if it is string and for the code's requirement, make it a list.
msg104588 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-29 21:05
@senthil: I'm not quite sure what your sentence referencing couchdb is getting at, but the headers that are being queried are in an email.Message object, and contain only string values unless some code is misusing the API and setting new non-string values after the initial header parse is has been done.

Regardless, though, my suggested change makes the code work with the documented API (which strongly implies that default is a string) while continuing to accept any previously working values that existing code may be passing in.
msg104590 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-04-29 21:13
couchdb-python makes http requests, yes.  

Passing in a number might be mildly inappropriate, BUT I'd like to point out that the default might be used to find out if the header is not set (hence its default value of None).  

It should not assume that a string is the proper input; any sentinel value should work (especially as python is dynamic, and the default parameter is not actually used for the function's logic).

Which goes back to the original patch, unless a list or tuple is provided, in which case the list is comma joined for backwards compatibility as suggested by David.
msg104592 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-04-29 21:20
Little more info:

The actual snippet from couchdb-python is:

  if int(resp.getheader('content-length', sys.maxsize)) < CHUNK_SIZE:

Which should be valid python . . . calling int() on an integer should work, and calling int() on a string expected to be an integer should work.
msg104608 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-30 01:55
I seem to have been missing some context here.  I now understand that this is a regression relative to Python 2.x.  It seems to me that the translation from rfc822.Message to email.Message was done incorrectly.  In the 2.x code getheader returns only the value of the first header with the given name, in the 3.x code it returns the comma separated concatenation of the values of all headers with that name, and breaks the handling of default.

Any idea which handling of header values is actually the correct behavior vis-a-vis the http protocol?

Either way, the translation broke 'default', and that should be fixed, especially since we know there is 2.x code depending on the documented behavior, and the documented behavior hasn't changed in 3.x.  Since it is a regression, I'm inclined to think we can get away with fixing it without preserving backward compatibility with 3.1.2 (ie: it's a bug, and no one is likely to have written code to work around the bug, they'd report it instead as Walter did). But my opinion on that should perhaps be verified by appeal to python-dev, once we've agreed on what the correct behavior of the routine should be given multiple headers with the same name.

I'm adding hdiogenes and barry as nosy since they were the principle people involved in the translation, but they may not remember why this particular change was made.  As far as I can see the change is in the patch Barry applied but not in hdiogenes' original patch in the relevant issue (issue 2848), so there is probably some discussion missing from the issue.
msg104609 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-30 02:26
On Fri, Apr 30, 2010 at 01:56:01AM +0000, R. David Murray wrote:
> I seem to have been missing some context here. 

I was referring to Walter's comment on default being an int, like
HTTPResponse.getheader('Fake-Content-Length',default=42). It was
surprising to me, but the he provided a sample snippet and pointed out
that default value can be None. So, yes, just checking of string value
might not give the correct results in all circumstances (esp
default=None case which is strong).

>In the 2.x code getheader returns only the value of the first header
>with the given name, in the 3.x code it returns the comma separated
>concatenation of the values of all headers with that name, and breaks
>the handling of default.
> Any idea which handling of header values is actually the correct
> behavior vis-a-vis the http protocol?

I believe, 2.x version is proper. I was surprised at get_all() method
introduced too, instead of getheader (if its present in Message)

Headers is a simply a dict and getheader is similar to dict.get(key,
default). I shall look up the HTTP RFC and confirm what should be done
if multiple values exists for same header.
msg104729 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-01 15:44
It seems that 3.x behavior is correct. I am quoting a relevant section from rfc 2616.

It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each
separated by a comma.

- Should we change the 2.x behavior then? (This can break apps, which might be depending upon the existing behavior of 2.x code)

Also, this bug is on default value, it fair to assume that the default might given as None, or a string and rarely as int. I don't see anyone would have passed it (correctly for current code) as list.

Why don't we handle it at email.message.Message's get_all method?
get_all is supposed to return a list or None.

Attaching a patch (which is almost David's first suggestion).
msg104730 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-01 15:47
previous patch had a typo and a mistake. uploading the correct one.
msg105044 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-05-05 14:57
Sorry I'm just getting back to this . . . Senthil, doesn't list(None) throw an exception?  That was the whole problem with list()ing the default argument.

And I don't think the problem should be fixed in email.message.Message.get_all() . . . that function works exactly as it says it should.  Its behavior is consistent.  This issue should not change that.  And even WITH changing that function, the patch would still need to fix http.client.HTTPResponse.getheader().  

Just check python 2.6, and it looks like that function works correctly.  If a number is passed, it returns a number as the default.  We'd be preserving backwards compatibility, not destroying it, by returning the default parameter unchanged in 3.X when the specified header does not exist.

I'll try attaching a patch before too long.
msg105046 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-05 15:06
Walter, just to address one of your point:

+            if failobj and not isinstance(failobj, list):
+                if isinstance(failobj, str):
+                    failobj = [failobj]
+                else:
+                    failobj = list(failobj)
             return failobj   

If the failobj is None, list(None) does not occur here and no exception will be raised. 

Yes, please attach a patch to this issue as you consider would be appropriate. We can see the differences. In py2.6, HTTPResponse.getheader did not invoke get_all method at all, it invokes a geheader method which is a simple dict lookup with default return.
msg105047 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-05-05 15:07
Senthil, you are correct, I gave a bad example.  However, try list()ing an integer :)
msg105050 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-05-05 15:31
Relevant part from trunk/Lib/ illustrating that returning default unchanged is the legacy/defined behavior:

        return self.dict.get(name.lower(), default)

See attached patch for py3k.  This preserves backwards compatibility (aside from the comma-joined list) and uses the default argument appropriately (consistently with the default argument for dict.get, for instance).

On another note, should we be patching Python 2.7 as well, since RFC 2616 states that the 3.X behavior of a comma-joined list is appropriate (as cited  by Senthil)?
msg105052 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-05-05 15:42
And if you really think that the unfortunate behavior of 

   >>> getheader('a', ['a', 'b'])
   'a, b'

Is desired, issue8572-unfortunate.diff retains that behavior.  I think that is counter-intuitive though, and it is doubtful that many would be relying on this functionality, given that the default value of None throws an exception at the moment.
msg105060 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-05 17:12
Given what we've learned, I think that Walter's first patch is the best fix.  No one should be relying on the current actual behavior of the default argument, and the fix makes it work as documented.

As for backporting to 2.7, I don't think so.  2.7 is in feature freeze, and the API change would be a feature and not a bug fix.  (You could argue that, though, based on the someone can appeal to Benjamin if desired.)

Walter, re the patch, why pass a default to get_all at all?  It is going to return None if there are no such headers.  Then you could test 'if headers is None' and the logic would be a little more pedantically safe.

Also, we need some unit tests for this.
msg105061 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-05-05 17:24
I'll add unit tests later if no one else gets to it first; and they have the same functionality, but I could change the default back to None and use is None if that would be clearer (it would).
msg112259 - (view) Author: David Stanek (dstanek) Date: 2010-08-01 03:26
I created some tests for the existing behavior and the expected behavior. One of my apps was passing in a tuple and the default. Since this already worked and there are probably others besides me expecting this behavior I changed the patch to handle this.

The new patch treats any non-string value as an iterable and performs a ', '.join() on it. This is basically the old behavior.

This patch is against the py3k branch.
msg112266 - (view) Author: David Stanek (dstanek) Date: 2010-08-01 05:21
Adding a patch that includes a documentation change.
msg112391 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-08-01 22:18
Hi David,

I like most of your patch (especially since it has unit tests), and if people like yourself are actually using the current functionality then that's fine, but one recommendation: why not change this line:

if not headers or isinstance(headers, str):

To also include the clause ``or getattr(headers, '__iter__', False)``.  That way, other default values (such as numbers) would work as expected rather than throw an error.  What do you think of that?
msg112392 - (view) Author: Walter Woods (Walter.Woods) Date: 2010-08-01 22:19
Sigh.  I meant ``or not`` instead of ``or``.  Clearly, not having an iterator would be a case to not iterate over the default :)
msg112411 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-02 00:37
Joining the iterator contents if default is an iterator is an ugly backward compatibility hack.  The default should really be returned unchanged.  But given there is code in the field working around the current bug, I guess we have to retain it.  Of course, it will fail if the iterator contains non-strings...oh well.

Perhaps we could add a deprecation warning when an iterator is used as a default, and in 3.3 stop special casing iterators.

And I agree that only iterators should be special cased, since anything else would currently be failing, and should instead be returned unchanged.
msg112412 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-02 00:40
I'm changing this to release blocker because I don't want to see the erroneous behavior make it into 3.2.
msg112451 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-02 12:12
Fixed in revision 83521 (py3k) and 83522 (release31-maint).
I made slight modifications to the patch to include non-iterable values,like int, for default too.

If you feel the documentation could be made better, please suggest the wordings for the sentence. I am leaving the bug open for this, if the current one is fine, we can close the bug.

David (RDM): I am not sure if we need a Deprecation Warning here.
If you actually see getheader() is supposed to return a string only as Headers are strings. I don't think someone should be consciously passing a list of items and excepting the list of items back.

', '.join of header values seem to be for situations in which get_all returns list of values, which is possible when there are different values for same header key.  This is a correct behavior IMO. The 2.x one is a mistake where it returned only the first value as a string.
msg112469 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-02 13:58
Right, the join behavior is correct.  But the definition of a *default* (aka sentinel) value is that it is returned *unchanged*.  It is counter intuitive that if you pass a list of strings as a default, that what you get back is those strings joined by ','.  But that is the released behavior. so we have to live with it.

I think the doc patch is OK except that it should be 'iterable' rather than 'iterator'.  However, it might make the whole method clearer if we say instead:

Return the value of the header *name*, or *default* if there is no header matching *name*.  If there is more than one header with the name *name*, return all of the values joined by ', '.  If 'default' is any iterable other than a single string, its elements are similarly returned joined by commas.
msg112501 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-02 17:15
David, your suggested description was much better.
Modified the documentation in r83529 and r83530.
Date User Action Args
2010-08-02 17:15:19orsenthilsetstatus: open -> closed

messages: + msg112501
2010-08-02 13:59:00r.david.murraysetpriority: normal

messages: + msg112469
2010-08-02 12:12:45orsenthilsetpriority: release blocker -> (no value)
resolution: fixed
messages: + msg112451

stage: test needed -> resolved
2010-08-02 00:40:51r.david.murraysetpriority: high -> release blocker

messages: + msg112412
2010-08-02 00:37:34r.david.murraysetmessages: + msg112411
2010-08-01 22:19:47Walter.Woodssetmessages: + msg112392
2010-08-01 22:18:19Walter.Woodssetmessages: + msg112391
2010-08-01 05:21:50dstaneksetfiles: + 8572-with-docs.diff

messages: + msg112266
2010-08-01 03:26:01dstaneksetfiles: + 8572.diff
nosy: + dstanek
messages: + msg112259

2010-06-12 13:33:31l0nwlfsetnosy: + l0nwlf
2010-06-12 13:31:59eric.araujosetkeywords: + easy
2010-06-12 12:42:33eric.araujosetkeywords: - easy
nosy: + eric.araujo
2010-05-05 17:24:18Walter.Woodssetmessages: + msg105061
2010-05-05 17:12:40r.david.murraysetmessages: + msg105060
2010-05-05 15:42:17Walter.Woodssetfiles: + issue8572-unfortunate.diff

messages: + msg105052
2010-05-05 15:31:42Walter.Woodssetfiles: + issue8572-alt.diff

messages: + msg105050
2010-05-05 15:07:59Walter.Woodssetmessages: + msg105047
2010-05-05 15:06:22orsenthilsetassignee: orsenthil
messages: + msg105046
2010-05-05 14:57:27Walter.Woodssetmessages: + msg105044
2010-05-01 15:47:30orsenthilsetfiles: + issue8572.diff

messages: + msg104730
2010-05-01 15:46:40orsenthilsetfiles: - issue8572.diff
2010-05-01 15:44:08orsenthilsetfiles: + issue8572.diff
keywords: + patch
messages: + msg104729
2010-04-30 02:26:04orsenthilsetmessages: + msg104609
2010-04-30 01:55:52r.david.murraysetpriority: normal -> high
versions: + Python 3.2
nosy: + barry, hdiogenes

messages: + msg104608
2010-04-29 21:20:03Walter.Woodssetmessages: + msg104592
2010-04-29 21:13:14Walter.Woodssetmessages: + msg104590
2010-04-29 21:05:37r.david.murraysetmessages: + msg104588
2010-04-29 20:52:14orsenthilsetmessages: + msg104586
2010-04-29 20:43:39r.david.murraysetmessages: + msg104585
2010-04-29 18:57:57Walter.Woodssetmessages: + msg104568
2010-04-29 18:19:31orsenthilsetnosy: + orsenthil
messages: + msg104557
2010-04-29 18:13:36Walter.Woodssetmessages: + msg104554
2010-04-29 18:07:23r.david.murraysettype: crash -> behavior
components: + Library (Lib)

keywords: + easy
nosy: + r.david.murray
messages: + msg104551
stage: test needed
2010-04-29 16:01:42Walter.Woodssettitle: httplib getheader() does not work as documented -> httplib getheader() throws error instead of default
2010-04-29 15:57:12Walter.Woodscreate