classification
Title: urrlib2/httplib doesn't reset file position between requests
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Anthony.Kong, LorenzMende, ajaksu2, dheiberg, ggenellina, jjlee, martin.panter, matejcik, nr, orsenthil
Priority: normal Keywords: easy, patch

Created on 2009-01-23 17:07 by matejcik, last changed 2019-02-17 15:03 by nr.

Files
File name Uploaded Description Edit
auth-mmap.py martin.panter, 2018-09-02 13:32 demonstration
Pull Requests
URL Status Linked Edit
PR 11843 closed python-dev, 2019-02-13 18:30
PR 11904 open nr, 2019-02-17 07:58
Messages (11)
msg80419 - (view) Author: jan matejek (matejcik) * Date: 2009-01-23 17:06
since 2.6 httplib supports reading from file-like objects.

Now consider the following situation:
There are two handlers in urrlib2, first is plain http, second is basic
auth.
I want to POST a file to a service, and pass the open file object as
data parameter to urllib2.urlopen.
First handler is invoked, it sends the file data, but gets 401
Unauthorized return code and fails with that.
Second handler in chain is invoked (at least that's how i understand
urrlib2, please correct me if i'm talking rubbish). At that point the
open file is at EOF, so empty data is sent.

furthermore, the obvious solution "you can't do this through urllib so
go read the file yourself" doesn't apply that well - the file object in
question is actually a mmap.mmap instance.
This code is in production since python 2.4. Until file object support
in httplib was introduced, it worked fine, handling the mmap'ed file as
a string. Now it is picked up as read()-able and this problem occurs.
Only workaround to restore pre-2.6 behavior that comes to mind is
building a wrapper class for the mmap object that hides its read() method.
msg80422 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-23 23:28
This happens in other implementations too, not just urllib2.

If the server supports it, the best way is to send an 'Expect: 100-
Continue' header field before attempting to send the actual file.
msg185512 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-03-29 19:49
I think, this requires triaging in terms of is the feature request still applicable. Except 100 is sent by httplib and the support for this was added few years ago, much later then this bug was originally raised.
msg241191 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-16 02:13
Actually, I do not think any “Expect: 100-continue” headers are explicitly sent by the Python standard library. The Python client does not support waiting for a “100 Continue” response; see Issue 1346874.

There is Issue 23740 opened about fixing or clarifying the various data types accepted by “http.client”.

On the other hand, the documentation for urlopen() says only bytes and iterables are supported. If mmap objects are being treated as file objects by urlopen() that is unexpected, and the documentation or implementation needs fixing there. Also, iterating a mmap() object is different from iterating either the equivalent bytearray() or file object, so there is something weird going on there.
msg324476 - (view) Author: Lorenz Mende (LorenzMende) * Date: 2018-09-02 08:26
Issue shall be closed, as no reproduction code is provided.
No patch provided and no comments since 2015.
msg324477 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-09-02 13:32
Here is a demonstration script in case it helps. I haven’t tested it with versions before Python 2.6.

Older versions send “Content-Length: 11”, but leave the server hanging trying to read the data. Newer versions (I presume since Issue 12319, 3.6+) send a valid HTTP 1.1 chunked request, but with empty data.
msg335624 - (view) Author: nr (nr) * Date: 2019-02-15 17:28
PR 11843 should fix the issue in master, I didn't check python 2.6 or prior versions. The problem is that in the first request sent to HTTP service the POST data is sent correctly. After that the HTTP server responds with 401 and the request is resent but the mmap file pointer is pointing now to the end of the file because it has been fully read in the requests before. The PR just seeks to the beginning of the file after the file has been read and sends the request with auth credentials including POST body.
msg335672 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-02-16 07:13
For 3.7+ (where iterable objects are supported), I suggest:

1. Document the problem as a limitation of handlers like AbstractBasicAuthHandler, and consider raising an exception instead of trying to upload a file or iterable a second time.

2. Clarify the behaviour for different types of the “urllib.request” data parameter. I understand “file-like objects” means objects with a “read” attribute, and the “read” method is called in preference to iteration or treating the parameter as a “bytes” object.

Despite the bug title, I don’t think the library should mess with the file position. Certainly not when making a single request. But it should already be possible for the caller to supply a custom iterable object that resets the file position:

class FileReiterator:
    def __iter__(self):
        self.file.seek(0)
        while True:
            chunk = self.file.read(self.chunksize)
            yield chunk
            if len(chunk) < self.chunksize:
                break
msg335760 - (view) Author: nr (nr) * Date: 2019-02-17 08:06
I added a new pull request.
Martin, you are right I realized when looking through the code that just setting the file pointer to zero inside http lib might interfere with requests that don't have authentication enabled.

The new pull requests does number 2.) of your suggestion for both Basic and Digest authentication.

Can you please review the code? Thank you.
msg335767 - (view) Author: nr (nr) * Date: 2019-02-17 10:16
I will fix the build errors first.
msg335778 - (view) Author: nr (nr) * Date: 2019-02-17 15:03
the pull request now passed the build checks, please review the code.
History
Date User Action Args
2019-02-17 15:03:22nrsetmessages: + msg335778
2019-02-17 10:16:06nrsetmessages: + msg335767
2019-02-17 08:06:16nrsetmessages: + msg335760
2019-02-17 07:58:12nrsetpull_requests: + pull_request11930
2019-02-16 07:13:24martin.pantersetmessages: + msg335672
2019-02-15 17:28:06nrsetnosy: + nr
messages: + msg335624
2019-02-13 18:30:33python-devsetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request11873
2019-01-23 00:57:19dheibergsetnosy: + dheiberg
2018-09-02 13:32:33martin.pantersetfiles: + auth-mmap.py

messages: + msg324477
2018-09-02 08:26:40LorenzMendesetnosy: + LorenzMende
messages: + msg324476
2015-04-16 02:13:57martin.pantersetnosy: + martin.panter
messages: + msg241191
2013-03-30 09:20:09Anthony.Kongsetnosy: + Anthony.Kong
2013-03-29 19:49:56orsenthilsetassignee: orsenthil
messages: + msg185512
2009-04-22 17:24:08ajaksu2setpriority: normal
keywords: + easy
2009-02-13 01:48:27ajaksu2setnosy: + jjlee
2009-02-12 18:38:39ajaksu2setnosy: + orsenthil, ajaksu2
stage: test needed
2009-01-23 23:28:55ggenellinasetnosy: + ggenellina
messages: + msg80422
2009-01-23 17:07:02matejcikcreate