Message 165927 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	orsenthil
Recipients	Anrs.Hu, Jim.Jewett, hongqn, orsenthil
Date	2012-07-20.13:59:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1342792781.6.0.730595522936.issue14562@psf.upfronthosting.co.za>
In-reply-to

Content
I had a discussion with Anrs on this, and it went along these lines - I confused the buffering issue (encountered with streaming data) of urllib2 with chunked transfer encoding. The flow will be blocked in the case at the socket level waiting for 8192 bytes. But this buffer size has been kept for buffered reading purposes of normal read scenarios. However, in case of streaming data, this may be not the best way. Here it is explained best - http://stackoverflow.com/questions/1598331/how-to-read-continous-http-streaming-data-in-python The advise is to make the socket buffer size to 0. import socket socket._fileobject.default_bufsize = 0 Now, if we come to chunked transfer encoding, the chunked transfer encoding will behave as it is advertised, like sending one chunk at the time, but still having the readline limit set by MAXLINE in the httplib.py. For the chunked transfer encoding to be recognized the client will have to get a header "transfer-encoding: chunked" from the server and when it receives that header, it will follow the path reading MAXLINE at the time and then returning. For smaller chunks with a blocking behavior of the server ( like you illustrated), we may still need to adopt to turn off default_bufsize to 0 to ensure quick responses to fill the buffer. At this moment, I think that the above thing could be documented in the urllib2 docs for the issue you had raised. Not sure, if any other approach would be suitable to handle this behavior. Anrs (The original poster) also responded that they way he had to overcome this for a very small chunks is setting the socket file size to 0 locally. >> resp = opener.open(server, urllib.urlencode(data)) >> resp = opener.open(server) >> resp.fp._rbufsize = 0 >> for line in iter(resp.readline, ''): >> yield line I think, this could be documented in a certain fashion (like support for streaming without buffering or transfers for small data sizes without buffering).

I had a discussion with Anrs on this, and it went along these lines - 

I confused the buffering issue (encountered with streaming data) of urllib2 with chunked transfer encoding.

The flow will be blocked in the case at the socket level waiting for 8192 bytes. But this buffer size has been kept for buffered reading purposes of normal read scenarios.
However, in case of streaming data, this may be not the best way.

Here it is explained best - 

http://stackoverflow.com/questions/1598331/how-to-read-continous-http-streaming-data-in-python

The advise is to make the socket buffer size to 0.

import socket

socket._fileobject.default_bufsize = 0

Now, if we come to chunked transfer encoding, the chunked transfer encoding will behave as it is advertised, like sending one chunk at the time, but still having the readline limit set by MAXLINE in the httplib.py. For the chunked transfer encoding to be recognized the client will have to get a header "transfer-encoding: chunked" from the server and when it receives that header, it will follow the path reading MAXLINE at the time and then returning. For smaller chunks with a blocking behavior of the server ( like you illustrated), we may still need to adopt to turn off default_bufsize to 0 to ensure quick responses to fill the buffer.

At this moment, I think that the above thing could be documented in the urllib2 docs for the issue you had raised. Not sure, if any other approach would be suitable to handle this behavior.

Anrs (The original poster) also responded that they way he had to overcome this for a very small chunks is setting the socket file size to 0 locally.

>> resp = opener.open(server, urllib.urlencode(data))
>> resp = opener.open(server)
>> resp.fp._rbufsize = 0
>> for line in iter(resp.readline, ''):
>>     yield line

I think, this could be documented in a certain fashion (like support for streaming without buffering or transfers for small data sizes without buffering).

History
Date	User	Action	Args
2012-07-20 13:59:41	orsenthil	set	recipients: + orsenthil, hongqn, Jim.Jewett, Anrs.Hu
2012-07-20 13:59:41	orsenthil	set	messageid: <1342792781.6.0.730595522936.issue14562@psf.upfronthosting.co.za>
2012-07-20 13:59:40	orsenthil	link	issue14562 messages
2012-07-20 13:59:39	orsenthil	create