New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support iterable bodies in httplib #47493
Comments
httplib should support requests whose bodies are iterable objects. This Index: Lib/httplib.py def send(self, str):
- """Send `str' to the server."""
+ """Send `str` to the server.
+
+ ``str`` can be a string object, a file-like object that supports
+ a .read() method, or an iterable object that supports a .next()
+ method.
+ """
if self.sock is None:
if self.auto_open:
self.connect()
@@ -710,6 +715,10 @@
while data:
self.sock.sendall(data)
data=str.read(blocksize)
+ elif hasattr(str,'next'):
+ if self.debuglevel > 0: print "sendIng an iterable"
+ for data in str:
+ self.sock.sendall(data)
else:
self.sock.sendall(str)
except socket.error, v: |
The attached patch implements this for python 2.7. It also adds support The patch also includes updates to docs and tests (which all pass as of |
Seems like a reasonable feature request. I'm going to apply a variant |
This patch and its tests still work. Any particular reason why it hasn't been adopted yet? |
Could someone with knowledge of httplib please move this forward, all the comments I see are positive. |
The patch needs to be ported to the py3k branch. |
Hi, attached is a ported version of the patch that applies against py3k. |
Instead of consider using Also consider changing the variable name from the now overly type specific, "str" to something like "source" to indicate the significance of the data, not its type. |
Hi Raymond, I assume you're referring to catlee's patch? 'str' has been changed to 'data' in python 3.2 While porting the patch I ran into this issue, which is that isinstance(str, collections.Iterable) doesn't behave exactly like hasattr(str,'next') >>> hasattr("lol", '__next__')
False
>>> isinstance("lol", collections.Iterable)
True so using the isinstance method would actually match against strings, which iirc would make the program fail. I can confirm later if needed. |
xuanji, the issue you stumbled upon was just fixed by Raymond for the report bpo-10565. |
What a timely coincidence. I'll try out the change soon. |
Changed according to Raymond's suggestion. I realized that there are some classes (str, bytes, array.array) that are iterable but should not be handled by the iteration logic provided by catlee. eg: if we iterate through b'a message' we'd get a bunch of ints and sending them is wrong. patch attached. |
Thanks for the patch. First, you don't need to support str, since sockets only accept binary strings (not unicode). Second, I think it's simpler and more generic to do something like: try: |
pitrou: that sounds good. I attached another patch. |
pitrou: actually that seems a bit suspect now... you need to handle 'data' differently depending on its type, and while you can determine the type by finding out when 'data' throws certain exceptions, it doesn't seem like what exceptions were meant for. |
Yes, but you can't know all appropriate types in advance, so it's better I don't understand your changes in Lib/urllib/request.py. len(data) will |
I don't fully understand Lib/urllib/request.py either, I just ported it and ran the unittests... it seems like what it does is that if you send an iterator through as 'data' you can't know the length in advance, and rather than let the len(data) raise an exception catlee thought it's better to raise an exception to tell the user exactly why the code failed (ie, because the user sent an iterator and there's no way to meaningfully find the Content-Length of that). As for the catching exceptions vs using isinstance: I thought about it for a while, I think something like this feels right to me: try: if isinstance(data, collections.Iterable):
for d in t:
self.sock.sendall(d)
else:
raise TypeError("data should be a bytes-like object or an iterable, got %r" % type(it)) anyway, calling iter(data) is equivalent to calling data.__iter__(), so catching the exception is equivalent to hasattr(data, '__iter__'), which is roughly the same as isinstance(data, collections.Iterable). so we try the most straightforward method (sending everything) then if that fails, data is either an iterator or a wrong type. |
No, it won't, if the iterable happens to be a sequence. |
Well, it seems the patch is confused between iterable and iterator. Only The patch should really check for iterables, so it should use:
|
davide: yeah, hasattr("lol", '__next__') == False, even though strings are Iterable; so for strings and other such sequences the len(data) line will be executed. So technically we shouldn't say "No Content-Length specified for iterable body" but we should say "No Content-Length specified for iterable body that is not a sequence". Basically, this whole patch (both parts of it) will be much better off iif there is a clean way to say "a is an iterable but a is not a sequence", because even though b'this is a message' is Iterable, we want to treat it differently compared to, say, a generator object; we do NOT want to use the Iterator features (iter, next) of it, we want to use the sequencey features (by sending the whole chunk of it, by calling len) |
Xuanji, a wording which does convey the approximate meaning is fine. I think, the Exception error messages will help the people based on the Context.
tests are fine. Docs and NEWS should be updated. Georg: Is it okay, if we push this feature in before Dec 4th, beta1 release? |
One way to check that it's bytes-compatible is to take a memoryview of it: >>> memoryview(b"abc")
<memory at 0x1cf5120>
>>> memoryview(bytearray(b"abc"))
<memory at 0x1cf55a0>
>>> memoryview(array.array('b', b"abc"))
<memory at 0x1cf52a0>
>>> memoryview([b"abc"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface
>>> memoryview("abc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface |
Xuanji is right: it's not. We want bytes to be accepted, and it's an |
Answering to myself, sorry. memoryview() does return the right answer of whether the object supports the buffer interface, *however* it doesn't mean the len() will be right. For example, take an array.array of ints: >>> memoryview(array.array("I", [1,2,3]))
<memory at 0x1cf5720>
>>> len(array.array("I", [1,2,3]))
3
>>> len(memoryview(array.array("I", [1,2,3])))
3
>>> len(bytes(array.array("I", [1,2,3])))
12 len() returns 3 but the number of bytes written out by sendall() will really be 12... *However*, the right len can be calculated using the memoryview: >>> m = memoryview(array.array("I", [1,2,3]))
>>> len(m) * m.itemsize
12 (without actually converting to a bytes object, so all this is cheap even for very large buffers) |
Updated patch after correcting the mistake (bytes vs str) in the previous one. |
On Thu, Dec 02, 2010 at 03:08:55AM +0000, Xuanji Li wrote:
That should be:
I updated some of those in the latest updated patch. |
And my version too... |
Actually I don't think you can go around changing test_urllib2.py, they are after all regression tests... and surely some users will send "" as data. My patch handles the case of 0-length data specially, removing the need for 2) Can call len but not buffer: assume len == #bytes |
On Thu, Dec 02, 2010 at 03:16:53AM +0000, Xuanji Li wrote:
+ if hasattr(data, '__len__') and not len(data): This is very special case. It should not be so. There was wrong Expect for the difference in (it = iter(data) - Which I am seeking Thank you. |
On Thu, Dec 02, 2010 at 03:20:05AM +0000, Xuanji Li wrote:
Think about it this way. In py3k, socket.send can handle only bytes, |
thought of a better way, we can check if data is true; this will cover "" and b"". this is in issue_3243_py3k_7.patch |
Glaring at my mistakes.
There were..
Except for ... |
The point is that Python does not define a __read__ magic method. Only read exists, on file objects. |
eric: sorry, that has been fixed in issue_3243_py3k_7.patch |
This is committed in r87399. Documentation and NEWS is added. Thanks for the patch and review comments. |
Hello, sorry for commenting on a closed issue... but I think the documentation change is incorrect. In urllib.request.rst, it says data is a string. However as seen in the changes to test_urllib2.py, data must be a bytes object rather than a string object. I think we should reopen this issue and change the documentation. Thanks! |
Also, the patch for request.py contains a debug statement, print(data) |
Good catch, fixed in r87400. |
Also, I am not familiar with the backward-comparability requirements of py3k, but orsenthil's patch will break py3k code that relies on data being a string, shouldn't this be mentioned somewhere? Finally, I do not understand why my proposed change, which is to add + if not data: so that code that relies on being able to pass a blank string as data (for instance, the code in test_urllib2.py) is still able to do so. |
Hmm, indeed: Senthil, could data be a string in earlier versions? If yes, the code should be changed to still allow that. (But after beta2 please, it's already tagged.) |
Raising priority so that this gets sorted out before final. |
Xuanji, Thanks for the comments on 'data' being bytes. I had just cared to add the feature information. I think that data detail should have been updated too. I think for your other two questions, we discussed it msg123051 - socket in py3k handles only bytes, sending string was wrong and test_urllib2 had the mistake in sending zero length strings which weren't detected. However, let me see if we have to accommodate those "very special case" where data can be a zero length string just to accommodate the mistake it was present in the earlier version. |
A mistake with Content-Length in the previous commit resolved in revision 87469. |
Anything left to do here, Senthil? |
Hi Georg, In the previous comment, I had written that 'let me see if we have to accommodate those "very special case" More, I think about it, the more it seems to me that accommodating that The decision boils down to this.
Because there is NO practical scenario where Zero length data as a
IMO, this is correct behavior and we need not accommodate the previous So, I would recommend closing this bug as Fixed without further change. |
Yes, I think we should close it. |
Thanks for the note. I shall some details to the NEWS entry before the release. Closing this report. |
This patch was opened for 2.7 but never applied there ? This causes a bug with requests HTTP library (and others as well as httplib) when you want to send an iterable object as POST data (with a non-chunked way), it works in Python 3 but not 2, and this effects behaviour and performance... |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: