Issue 849407: urllib reporthook could be more informative

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/39609

classification

Title:	urllib reporthook could be more informative
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.3

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	allanbwilson, gvanrossum, loewis
Priority:	normal	Keywords:	patch

Created on 2003-11-26 03:41 by allanbwilson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
urllibdiff.txt	allanbwilson, 2003-11-27 20:03	diff -c ... >urllibdiff.txt

Messages (4)
msg44947 - (view)	Author: Allan B. Wilson (allanbwilson)	Date: 2003-11-26 03:41
A reporthook in urllib.urlretrieve() (in 2.3.2) is given the max number of characters accepted ("bs") per .read() as its second argument. It would be more helpful to receive the number of characters actually retrieved in the most recent block. While perhaps this would break some existing code (though I can't imagine how), the minor patches below will allow giving progess updates, etc. that are accurate. Thanks Allan Wilson ------------ * urllib.py.old Tue Nov 25 17:42:55 2003 --- urllib.py Tue Nov 25 18:00:50 2003 *********** * 236,248 **** reporthook(0, bs, size) block = fp.read(bs) if reporthook: ! reporthook(1, bs, size) while block: tfp.write(block) block = fp.read(bs) blocknum = blocknum + 1 if reporthook: ! reporthook(blocknum, bs, size) fp.close() tfp.close() del fp --- 236,248 ---- reporthook(0, bs, size) block = fp.read(bs) if reporthook: ! reporthook(1, len(block), size) while block: tfp.write(block) block = fp.read(bs) blocknum = blocknum + 1 if reporthook: ! reporthook(blocknum, len(block), size) fp.close() tfp.close() del fp
msg44948 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-11-27 19:41
Logged In: YES user_id=21627 Can you please attach the patch, instead of pasting it?
msg44949 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2006-11-19 16:44
I notice that the patch doesn't apply to the svn head (2.6a0). But that's easily fixed and the idea still applies. As the original author of the code being patched I believe my reason for doing it the old way was that I wanted the report hook to be called before the first block, which would let a GUI open up a dialog box before anything was read. The idea was that if the reads are really slow, you'd want the dialog box there right from the start. But this was rather naive, since the most likely source of delay is making the connection and getting the response header back, and the report hook isn't being called at all until all the headers have been seen. The changed API to reporthook() needs to be documented very clearly. There's one call to reporthook() that still passes the block size instead of the actual data size. A naive implementation could be confused by this call, although it is easily recognized because it is the first call and the only one with blocknum equal to zero. I think this is a fine change -- as long as it isn't backported, since it is clearly a feature change. I do wonder "why bother", since most people using urllib don't care all that much about extreme details (I can't remember the last time I specified a reporthook), and most people caring about details don't like urllib and use something else (e.g. httplib, or urllib2). So I guess I'm somewhere between +0 and -0 on this on this.
msg44950 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-11-21 18:29
Discussion on python-dev revealed that read() on a socket will always give you blocksize data, except for the last block. So this doesn't really change anything in practice; applications that find that the data read (blocksize*blocknumber) exceeds the amount of data expected should conclude that they saw the last block. Rejecting this patch.

History
Date	User	Action	Args
2022-04-11 14:56:01	admin	set	github: 39609
2003-11-26 03:41:59	allanbwilson	create