Message 272164 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Rotkraut
Recipients	Rotkraut, demian.brecht, harobed, martin.panter, orsenthil, petri.lehtinen, piotr.dobrogost, pitrou, whitemice
Date	2016-08-08.12:19:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1470658769.24.0.159293634846.issue12319@psf.upfronthosting.co.za>
In-reply-to

Content
Martin, > I wonder if it would be safer to test for TextIOBase. With the > read(0) hack, the zero size may be unexpected. A file object may > need special handling to generate an empty result, or to not > interpret it as an EOF indicator. See e.g. Issue 23804, about > zero-sized SSL reads, where both of these problems happened. As long > as we document that the text file logic has changed, I think it > would be okay. IMO it is better to keep things simple and robust for > byte files than add more hacks for text files. The problem is that there is no accepted definition of what a file-like object is. As a consequence, each test method will fail in some cases. Not all file-like objects are derived from the base classes in the io module. It is common praxis to simply define an object with a read() method and to consider this a file-like. This pattern can even be found in the official Python test suite. Testing for TextIOBase will fail in these cases. There seem to be at least a minimal consensus that each file-like (that supports reading) must have the read() method implementing the interface as defined by the io base classes. The read(0) test (I wouldn't call it a hack, as it only uses this documented API) has the advantage of making no other assumptions then this. And I even would not accept Issue 23804 as a counter example as this was clearly a bug in the ssl module. The current test using the mode attribute is certainly the worst one as it even fails for standard classes from the io module. Anyway. Even though I would prefer the read(0) test, I am happy to accept any other method. Please tell me what I should use. > However I still don’t see why the checks are worthwhile. Sure having > both is a technical violation of the current HTTP protocols. But if > a caller is likely to end up in one of these situations, it may > indicate a problem with the API that we should fix instead. If the > checks are unlikely to ever fail, then drop them. The question is whether we should apply sanity checks on caller's input or whether we should rather send bullshit out and wait for the request to fail. I'd rather give the caller a somewhat helpful error message rather then a lesser helpful "400 Bad Request" from the server. But if you really insist, I can also remove these checks. I don't understand your remark on the API and what the likeliness of rubbish header values set by the caller has to do with it, sorry. > Otherwise, I don’t think any of the bugs I pointed out have been > addressed. Could you elaborate on what bugs you are talking about? I really don't know, sorry. > I don’t like converting bytes-like to bytes. I understand one of the > main purposes of accepting bytes-like objects is to avoid copying > them, but converting to bytes defeats that. Also, the copying is > inconsistent with the lack of copying in _read_iterable(). I added this conversion because you mentioned Issue 27340 and the problem to send byte-like objects with the ssl module. I agree to leave this to Issue 27340 to find a better solution and will remove the conversion. > I am wondering if it would be best to keep new logic out of send(); > just put it into _send_output() instead. Once upon a time, send() > was a simple wrapper around socket.sendall() without much bloat. But > with the current patch, in order to send a single buffer, we have to > jump through a bunch of checks, create a dummy lambda and singleton > tuple, loop over the singleton, call the lambda, realize there is no > chunked encoding, etc. IMO the existing support for iterables should > have been put in _send_output() too (when that was an option). There > is no need to use send() to send the whole body when endheaders() > already supports that. No problem, I can move all this to _send_output(). The only problem that I see then, is that the current library version of send() already accepts file-like (both text and byte streams), bytes, and iterables. So we will end up with two different methods accepting this variety of data types and behaving differently in the respective cases. Maybe, as a consequence, one should restrict send() to accept only bytes in order to tighten consistency. But this is not in the scope of this Issue and will not be covered by my patch. Please note that I will be in holidays starting with next Monday. We should try to get this done during this week.

Martin,

> I wonder if it would be safer to test for TextIOBase. With the
> read(0) hack, the zero size may be unexpected. A file object may
> need special handling to generate an empty result, or to not
> interpret it as an EOF indicator. See e.g. Issue 23804, about
> zero-sized SSL reads, where both of these problems happened. As long
> as we document that the text file logic has changed, I think it
> would be okay. IMO it is better to keep things simple and robust for
> byte files than add more hacks for text files.

The problem is that there is no accepted definition of what a file-like object is.  As a consequence, each test method will fail in some cases.  Not all file-like objects are derived from the base classes in the io module.  It is common praxis to simply define an object with a read() method and to consider this a file-like.  This pattern can even be found in the official Python test suite.  Testing for TextIOBase will fail in these cases.

There seem to be at least a minimal consensus that each file-like (that supports reading) must have the read() method implementing the interface as defined by the io base classes.  The read(0) test (I wouldn't call it a hack, as it only uses this documented API) has the advantage of making no other assumptions then this.  And I even would not accept Issue 23804 as a counter example as this was clearly a bug in the ssl module.

The current test using the mode attribute is certainly the worst one as it even fails for standard classes from the io module.

Anyway.  Even though I would prefer the read(0) test, I am happy to accept any other method.  Please tell me what I should use.


> However I still don’t see why the checks are worthwhile. Sure having
> both is a technical violation of the current HTTP protocols. But if
> a caller is likely to end up in one of these situations, it may
> indicate a problem with the API that we should fix instead. If the
> checks are unlikely to ever fail, then drop them.

The question is whether we should apply sanity checks on caller's input or whether we should rather send bullshit out and wait for the request to fail.  I'd rather give the caller a somewhat helpful error message rather then a lesser helpful "400 Bad Request" from the server.  But if you really insist, I can also remove these checks.

I don't understand your remark on the API and what the likeliness of rubbish header values set by the caller has to do with it, sorry.

> Otherwise, I don’t think any of the bugs I pointed out have been
> addressed.

Could you elaborate on what bugs you are talking about?  I really don't know, sorry.

> I don’t like converting bytes-like to bytes. I understand one of the
> main purposes of accepting bytes-like objects is to avoid copying
> them, but converting to bytes defeats that. Also, the copying is
> inconsistent with the lack of copying in _read_iterable().

I added this conversion because you mentioned Issue 27340 and the problem to send byte-like objects with the ssl module.  I agree to leave this to Issue 27340 to find a better solution and will remove the conversion.

> I am wondering if it would be best to keep new logic out of send();
> just put it into _send_output() instead. Once upon a time, send()
> was a simple wrapper around socket.sendall() without much bloat. But
> with the current patch, in order to send a single buffer, we have to
> jump through a bunch of checks, create a dummy lambda and singleton
> tuple, loop over the singleton, call the lambda, realize there is no
> chunked encoding, etc. IMO the existing support for iterables should
> have been put in _send_output() too (when that was an option). There
> is no need to use send() to send the whole body when endheaders()
> already supports that.

No problem, I can move all this to _send_output().  The only problem that I see then, is that the current library version of send() already accepts file-like (both text and byte streams), bytes, and iterables.  So we will end up with two different methods accepting this variety of data types and behaving differently in the respective cases.

Maybe, as a consequence, one should restrict send() to accept only bytes in order to tighten consistency.  But this is not in the scope of this Issue and will not be covered by my patch.


Please note that I will be in holidays starting with next Monday.  We should try to get this done during this week.

History
Date	User	Action	Args
2016-08-08 12:19:29	Rotkraut	set	recipients: + Rotkraut, orsenthil, pitrou, harobed, petri.lehtinen, martin.panter, piotr.dobrogost, demian.brecht, whitemice
2016-08-08 12:19:29	Rotkraut	set	messageid: <1470658769.24.0.159293634846.issue12319@psf.upfronthosting.co.za>
2016-08-08 12:19:29	Rotkraut	link	issue12319 messages
2016-08-08 12:19:28	Rotkraut	create