Support iterable bodies in httplib #47493

catlee · 2008-06-30T18:01:18Z

BPO	3243
Nosy	@birkenfeld, @rhettinger, @orsenthil, @pitrou, @catlee, @merwok, @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll, @sorcio, @tzickel
PRs	[2.7] bpo-3243: Support iterable bodies in httplib #10226
Files	python-3243.patch: Patch for python2.7 issue_3243_py3k.diff issue_3243_py3k_2.patch issue_3243_py3k_3.patch Issue3243-4.patch issue_3243_py3k_5.patch Issue3243-6.patch issue_3243_py3k_6.patch issue_3243_py3k_7.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/orsenthil'
closed_at = <Date 2011-01-15.18:26:50.823>
created_at = <Date 2008-06-30.18:01:18.309>
labels = ['easy', 'type-feature', 'library']
title = 'Support iterable bodies in httplib'
updated_at = <Date 2018-10-30.20:47:43.149>
user = 'https://github.com/catlee'

bugs.python.org fields:

activity = <Date 2018-10-30.20:47:43.149>
actor = 'tzickel'
assignee = 'orsenthil'
closed = True
closed_date = <Date 2011-01-15.18:26:50.823>
closer = 'orsenthil'
components = ['Library (Lib)']
creation = <Date 2008-06-30.18:01:18.309>
creator = 'catlee'
dependencies = []
files = ['12241', '19852', '19873', '19879', '19890', '19891', '19900', '19901', '19902']
hgrepos = []
issue_num = 3243
keywords = ['patch', 'easy', 'needs review']
message_count = 62.0
messages = ['69014', '77037', '84304', '100817', '110675', '120999', '122647', '122707', '122756', '122784', '122787', '122819', '122822', '122878', '122898', '122904', '122905', '122906', '122907', '122909', '122910', '122911', '122912', '122915', '122921', '122970', '122973', '122986', '122987', '122996', '122998', '122999', '123001', '123038', '123039', '123040', '123043', '123044', '123045', '123046', '123047', '123049', '123050', '123051', '123052', '123053', '123069', '123073', '124339', '124340', '124341', '124342', '124343', '124344', '124345', '124355', '124588', '126326', '126342', '126344', '126346', '328946']
nosy_count = 11.0
nosy_names = ['jhylton', 'georg.brandl', 'rhettinger', 'orsenthil', 'pitrou', 'catlee', 'eric.araujo', 'rcoyner', 'xuanji', 'davide.rizzo', 'tzickel']
pr_nums = ['10226']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue3243'
versions = ['Python 3.2']

catlee · 2008-06-30T18:01:17Z

httplib should support requests whose bodies are iterable objects. This
would facilitate doing large file uploads via HTTP since you wouldn't
have to load the entire file into memory to create the request string.

Index: Lib/httplib.py
===================================================================
--- Lib/httplib.py (revision 64600)
+++ Lib/httplib.py (working copy)
@@ -688,7 +688,12 @@
self.__state = _CS_IDLE

    def send(self, str):
-        """Send `str' to the server."""
+        """Send `str` to the server.
+
+        ``str`` can be a string object, a file-like object that supports
+        a .read() method, or an iterable object that supports a .next()
+        method.
+        """
        if self.sock is None:
            if self.auto_open:
                self.connect()
@@ -710,6 +715,10 @@
                while data:
                    self.sock.sendall(data)
                    data=str.read(blocksize)
+            elif hasattr(str,'next'):
+                if self.debuglevel > 0: print "sendIng an iterable"
+                for data in str:
+                    self.sock.sendall(data)
            else:
                self.sock.sendall(str)
        except socket.error, v:

catlee · 2008-12-05T16:43:40Z

The attached patch implements this for python 2.7. It also adds support
for iterable bodies in urllib2, where it is more generally useful.
urllib2 enforces the presence of a Content-Length header in the request
if the body is an iterable, whereas httplib does not.

The patch also includes updates to docs and tests (which all pass as of
r67584 on my macbook)

jhylton · 2009-03-28T12:21:07Z

Seems like a reasonable feature request. I'm going to apply a variant
of the patch in 3.1 first.

rcoyner · 2010-03-11T00:02:12Z

This patch and its tests still work. Any particular reason why it hasn't been adopted yet?

BreamoreBoy · 2010-07-18T19:32:09Z

Could someone with knowledge of httplib please move this forward, all the comments I see are positive.

merwok · 2010-11-12T00:46:30Z

The patch needs to be ported to the py3k branch.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-28T10:39:09Z

Hi, attached is a ported version of the patch that applies against py3k.

rhettinger · 2010-11-28T19:10:13Z

Instead of
hasattr(str,'next')

consider using
isinstance(str, collections.Iterable)

Also consider changing the variable name from the now overly type specific, "str" to something like "source" to indicate the significance of the data, not its type.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-29T01:46:29Z

Hi Raymond, I assume you're referring to catlee's patch? 'str' has been changed to 'data' in python 3.2

While porting the patch I ran into this issue, which is that isinstance(str, collections.Iterable) doesn't behave exactly like hasattr(str,'next')

>>> hasattr("lol", '__next__')
False
>>> isinstance("lol", collections.Iterable)
True

so using the isinstance method would actually match against strings, which iirc would make the program fail. I can confirm later if needed.

orsenthil · 2010-11-29T09:15:07Z

xuanji, the issue you stumbled upon was just fixed by Raymond for the report bpo-10565.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-29T11:23:42Z

What a timely coincidence. I'll try out the change soon.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-29T15:53:47Z

Changed according to Raymond's suggestion. I realized that there are some classes (str, bytes, array.array) that are iterable but should not be handled by the iteration logic provided by catlee. eg: if we iterate through b'a message' we'd get a bunch of ints and sending them is wrong. patch attached.

pitrou · 2010-11-29T16:00:04Z

Thanks for the patch.

First, you don't need to support str, since sockets only accept binary strings (not unicode).

Second, I think it's simpler and more generic to do something like:

try:
self.sock.sendall(data)
except TypeError:
try:
it = iter(data)
except TypeError:
raise TypeError("data should be a bytes-like object or "
"an iterable, got %r" % type(it))
for d in t:
self.sock.sendall(d)

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-30T03:20:15Z

pitrou: that sounds good. I attached another patch.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-30T11:00:02Z

pitrou: actually that seems a bit suspect now... you need to handle 'data' differently depending on its type, and while you can determine the type by finding out when 'data' throws certain exceptions, it doesn't seem like what exceptions were meant for.

pitrou · 2010-11-30T13:27:48Z

pitrou: actually that seems a bit suspect now... you need to handle
'data' differently depending on its type,

Yes, but you can't know all appropriate types in advance, so it's better
to try and catch the TypeError.

I don't understand your changes in Lib/urllib/request.py. len(data) will
raise anyway.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-30T14:53:45Z

I don't fully understand Lib/urllib/request.py either, I just ported it and ran the unittests... it seems like what it does is that if you send an iterator through as 'data' you can't know the length in advance, and rather than let the len(data) raise an exception catlee thought it's better to raise an exception to tell the user exactly why the code failed (ie, because the user sent an iterator and there's no way to meaningfully find the Content-Length of that).

As for the catching exceptions vs using isinstance: I thought about it for a while, I think something like this feels right to me:

try:
self.sock.sendall(data)
except TypeError:

      if isinstance(data, collections.Iterable):
          for d in t:
              self.sock.sendall(d)
      else:
          raise TypeError("data should be a bytes-like object or an iterable, got %r" % type(it))

anyway, calling iter(data) is equivalent to calling data.__iter__(), so catching the exception is equivalent to hasattr(data, '__iter__'), which is roughly the same as isinstance(data, collections.Iterable). so we try the most straightforward method (sending everything) then if that fails, data is either an iterator or a wrong type.

sorcio · 2010-11-30T15:35:35Z

len(data) will raise anyway.

No, it won't, if the iterable happens to be a sequence.

pitrou · 2010-11-30T15:43:50Z

> len(data) will raise anyway.

No, it won't, if the iterable happens to be a sequence.

Well, it seems the patch is confused between iterable and iterator. Only
iterators have a __next__, but they usually don't have a __len__.

The patch should really check for iterables, so it should use:

if isinstance(data, collections.Iterable)
    raise ValueError#etc.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-11-30T15:55:42Z

davide: yeah, hasattr("lol", '__next__') == False, even though strings are Iterable; so for strings and other such sequences the len(data) line will be executed. So technically we shouldn't say "No Content-Length specified for iterable body" but we should say "No Content-Length specified for iterable body that is not a sequence".

Basically, this whole patch (both parts of it) will be much better off iif there is a clean way to say "a is an iterable but a is not a sequence", because even though b'this is a message' is Iterable, we want to treat it differently compared to, say, a generator object; we do NOT want to use the Iterator features (iter, next) of it, we want to use the sequencey features (by sending the whole chunk of it, by calling len)

orsenthil · 2010-11-30T16:03:06Z

Xuanji, a wording which does convey the approximate meaning is fine. I think, the Exception error messages will help the people based on the Context.

Lets have the ValueError raised from the urllib/request.py. Changing it to isinstance(data,collections.Iterable) as Antoine suggested is okay here too.
Same change for http.client code checking for an Iterable as specified in msg122905

tests are fine. Docs and NEWS should be updated.

Georg: Is it okay, if we push this feature in before Dec 4th, beta1 release?

pitrou · 2010-11-30T16:07:32Z

One way to check that it's bytes-compatible is to take a memoryview of it:

>>> memoryview(b"abc")
<memory at 0x1cf5120>
>>> memoryview(bytearray(b"abc"))
<memory at 0x1cf55a0>
>>> memoryview(array.array('b', b"abc"))
<memory at 0x1cf52a0>

>>> memoryview([b"abc"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface
>>> memoryview("abc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface

pitrou · 2010-11-30T16:10:26Z

Lets have the ValueError raised from the urllib/request.py. Changing
it to isinstance(data,collections.Iterable) as Antoine suggested is
okay here too.

Xuanji is right: it's not. We want bytes to be accepted, and it's an
iterable.

pitrou · 2010-11-30T16:37:47Z

Answering to myself, sorry. memoryview() does return the right answer of whether the object supports the buffer interface, *however* it doesn't mean the len() will be right. For example, take an array.array of ints:

>>> memoryview(array.array("I", [1,2,3]))
<memory at 0x1cf5720>
>>> len(array.array("I", [1,2,3]))
3
>>> len(memoryview(array.array("I", [1,2,3])))
3
>>> len(bytes(array.array("I", [1,2,3])))
12

len() returns 3 but the number of bytes written out by sendall() will really be 12...

*However*, the right len can be calculated using the memoryview:

>>> m = memoryview(array.array("I", [1,2,3]))
>>> len(m) * m.itemsize
12

(without actually converting to a bytes object, so all this is cheap even for very large buffers)

orsenthil · 2010-12-02T03:14:10Z

Updated patch after correcting the mistake (bytes vs str) in the previous one.

orsenthil · 2010-12-02T03:15:47Z

On Thu, Dec 02, 2010 at 03:08:55AM +0000, Xuanji Li wrote:

req = Request("http://example.com/", "")

That should be:

req = Request("http://example.com/", b"")

I updated some of those in the latest updated patch.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-02T03:16:50Z

And my version too...

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-02T03:19:50Z

Actually I don't think you can go around changing test_urllib2.py, they are after all regression tests... and surely some users will send "" as data.

My patch handles the case of 0-length data specially, removing the need for 2) Can call len but not buffer: assume len == #bytes

orsenthil · 2010-12-02T03:22:29Z

On Thu, Dec 02, 2010 at 03:16:53AM +0000, Xuanji Li wrote:

And my version too...

+ if hasattr(data, '__len__') and not len(data):
+ request.add_unredirected_header('Content-length', '0')

This is very special case. It should not be so. There was wrong
examples in the test_urllib2 which I just corrected.

Expect for the difference in (it = iter(data) - Which I am seeking
some advice too). Rest of the things in both patches are same.

Thank you.

orsenthil · 2010-12-02T03:23:45Z

On Thu, Dec 02, 2010 at 03:20:05AM +0000, Xuanji Li wrote:

Actually I don't think you can go around changing test_urllib2.py,
they are after all regression tests... and surely some users will
send "" as data.

Think about it this way. In py3k, socket.send can handle only bytes,
not string. So sending "" as data to the socket.send is wrong.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-02T03:28:46Z

thought of a better way, we can check if data is true; this will cover "" and b"". this is in issue_3243_py3k_7.patch

orsenthil · 2010-12-02T03:45:32Z

Glaring at my mistakes.

There was wrong examples in the test_urllib2 which I just corrected.

There were..

Expect for the difference in (it = iter(data) - Which I am seeking

Except for ...

merwok · 2010-12-02T12:47:01Z

> What is __read__ supposed to be?
I don't think is required.

The point is that Python does not define a __read__ magic method. Only read exists, on file objects.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-02T14:12:19Z

eric: sorry, that has been fixed in issue_3243_py3k_7.patch

orsenthil · 2010-12-19T10:52:36Z

This is committed in r87399. Documentation and NEWS is added. Thanks for the patch and review comments.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-19T12:24:31Z

Hello, sorry for commenting on a closed issue... but I think the documentation change is incorrect. In urllib.request.rst, it says data is a string. However as seen in the changes to test_urllib2.py, data must be a bytes object rather than a string object. I think we should reopen this issue and change the documentation.

Thanks!

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-19T12:28:50Z

Also, the patch for request.py contains a debug statement, print(data)

birkenfeld · 2010-12-19T12:34:03Z

Good catch, fixed in r87400.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2010-12-19T12:47:10Z

Also, I am not familiar with the backward-comparability requirements of py3k, but orsenthil's patch will break py3k code that relies on data being a string, shouldn't this be mentioned somewhere?

Finally, I do not understand why my proposed change, which is to add

+ if not data:
+ request.add_unredirected_header('Content-length', '0')

so that code that relies on being able to pass a blank string as data (for instance, the code in test_urllib2.py) is still able to do so.

birkenfeld · 2010-12-19T12:58:51Z

Hmm, indeed: Senthil, could data be a string in earlier versions?

If yes, the code should be changed to still allow that. (But after beta2 please, it's already tagged.)

birkenfeld · 2010-12-19T13:03:17Z

Raising priority so that this gets sorted out before final.

orsenthil · 2010-12-19T14:46:42Z

Xuanji, Thanks for the comments on 'data' being bytes. I had just cared to add the feature information. I think that data detail should have been updated too.

I think for your other two questions, we discussed it msg123051 - socket in py3k handles only bytes, sending string was wrong and test_urllib2 had the mistake in sending zero length strings which weren't detected.

However, let me see if we have to accommodate those "very special case" where data can be a zero length string just to accommodate the mistake it was present in the earlier version.

orsenthil · 2010-12-24T04:05:30Z

A mistake with Content-Length in the previous commit resolved in revision 87469.

birkenfeld · 2011-01-15T13:12:13Z

Anything left to do here, Senthil?

orsenthil · 2011-01-15T17:32:41Z

Hi Georg,

In the previous comment, I had written that

'let me see if we have to accommodate those "very special case"
where data can be a zero length string just to accommodate the mistake
it was present in the earlier version.'

More, I think about it, the more it seems to me that accommodating that
special wrong case is not required.

The decision boils down to this.

In py3k, data for POST should be bytes.
But urllib.request had a 'bug/hole' that when a zero length string
was passed as a data, it did not raise an exception. There were cases
in test_urllib2 where zero length string was passed. I argue that it
was more of a mistake than, what we actually wanted to test it.

Because there is NO practical scenario where Zero length data as a
POST is useful.

Now, with the introduction of this feature requested in this issue,
this zero length string would raise an Exception and would demand that
even if it is zero length, please send it as bytes.

IMO, this is correct behavior and we need not accommodate the previous
one.

So, I would recommend closing this bug as Fixed without further change.
At most, a NEWS item can be added to explain point 2.

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll · 2011-01-15T17:41:11Z

Yes, I think we should close it.

orsenthil · 2011-01-15T18:26:51Z

Thanks for the note. I shall some details to the NEWS entry before the release. Closing this report.

tzickel · 2018-10-30T20:47:43Z

This patch was opened for 2.7 but never applied there ?

#10226

This causes a bug with requests HTTP library (and others as well as httplib) when you want to send an iterable object as POST data (with a non-chunked way), it works in Python 3 but not 2, and this effects behaviour and performance...

catlee mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jun 30, 2008

jhylton mannequin self-assigned this Mar 28, 2009

merwok added the easy label Nov 28, 2010

orsenthil assigned orsenthil and unassigned jhylton Nov 28, 2010

orsenthil closed this as completed Dec 19, 2010

birkenfeld reopened this Dec 19, 2010

birkenfeld added the deferred-blocker label Dec 19, 2010

birkenfeld removed the deferred-blocker label Jan 15, 2011

orsenthil closed this as completed Jan 15, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Support iterable bodies in httplib #47493

Support iterable bodies in httplib #47493

Comments

catlee mannequin commented Jun 30, 2008

catlee mannequin commented Jun 30, 2008

catlee mannequin commented Dec 5, 2008

jhylton mannequin commented Mar 28, 2009

rcoyner mannequin commented Mar 11, 2010

BreamoreBoy mannequin commented Jul 18, 2010

merwok commented Nov 12, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 28, 2010

rhettinger commented Nov 28, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 29, 2010

orsenthil commented Nov 29, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 29, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 29, 2010

pitrou commented Nov 29, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 30, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 30, 2010

pitrou commented Nov 30, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 30, 2010

sorcio mannequin commented Nov 30, 2010

pitrou commented Nov 30, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Nov 30, 2010

orsenthil commented Nov 30, 2010

pitrou commented Nov 30, 2010

pitrou commented Nov 30, 2010

pitrou commented Nov 30, 2010

orsenthil commented Dec 2, 2010

orsenthil commented Dec 2, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 2, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 2, 2010

orsenthil commented Dec 2, 2010

orsenthil commented Dec 2, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 2, 2010

orsenthil commented Dec 2, 2010

merwok commented Dec 2, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 2, 2010

orsenthil commented Dec 19, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 19, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 19, 2010

birkenfeld commented Dec 19, 2010

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Dec 19, 2010

birkenfeld commented Dec 19, 2010

birkenfeld commented Dec 19, 2010

orsenthil commented Dec 19, 2010

orsenthil commented Dec 24, 2010

birkenfeld commented Jan 15, 2011

orsenthil commented Jan 15, 2011

IIIIllllIIIIllllIIIIllllIIIIllllIIIIll mannequin commented Jan 15, 2011

orsenthil commented Jan 15, 2011

tzickel mannequin commented Oct 30, 2018