Issue 5536: urllib: urlretrieve() does not close file objects on failure

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49786

classification

Title:	urllib: urlretrieve() does not close file objects on failure
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.0

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	benjamin.peterson, petr.dolezal
Priority:	normal	Keywords:

Created on 2009-03-22 12:28 by petr.dolezal, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (2)
msg83970 - (view)	Author: Petr Dolezal (petr.dolezal)	Date: 2009-03-22 12:28
urllib.request.urlretrieve() does not close the file object created for the retrieval when it fails during processing of the incoming data and raises an exception (e.g. on HTTP 404 response). Therefore the file remains opened until the process terminates and the OS itself closes the orphaned file handle. This behaviour may result in orphaned temporary/incomplete files. It is also not just a resource leak, but it has another bad side effect on Windows platform (at least): the file can't be deleted (due to the used creation mode) before the handle is closed. But the entire file object, including the handle, is lost due to the exception, thus nobody (including the process itself) is able to delete the file until the process terminates. Consider this code snippet demonstrating the described behaviour: import os import urllib.request FILENAME = 'nonexistent.html' try: # The host must be valid, else the address resolving fails # before the target file is even created. But existing host # and non-existent resource is exactly what's the problem. NON_EXISTENT_URL = 'http://www.python.org/nonexistent.html' urllib.request.urlretrieve(NON_EXISTENT_URL, FILENAME) except Exception: if os.path.exists(FILENAME): print('File exists! Attempting to delete.') os.unlink(FILENAME) print('Succeeded.') On Windows, following output appears: File exists! Attempting to delete. Traceback (most recent call last): File "test.py", line 6, in <module> urllib.request.urlretrieve(NON_EXISTENT_URL, FILENAME) File "C:\Program Files\Python\lib\urllib\request.py", line 134, in urlretrieve return _urlopener.retrieve(url, filename, reporthook, data) File "C:\Program Files\Python\lib\urllib\request.py", line 1502, in retrieve block = fp.read(bs) File "C:\Program Files\Python\lib\io.py", line 572, in read self._checkClosed() File "C:\Program Files\Python\lib\io.py", line 450, in _checkClosed if msg is None else msg) ValueError: I/O operation on closed file. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test.py", line 10, in <module> os.unlink(FILENAME) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'nonexistent.html' As a quick fix it is possible to ensure closing both source and target file objects in finally blocks. I also assume the function should delete the target file on an exception: the file is not only incomplete, but its name is also unknown to the client code in the case of the temporary file made by urlretrieve() itself. If the client code is interested in partial downloads, I guess it should take another way to retrieve the resource as urlretrieve() interface doesn't look like supporting something like partial download. Anyway, the proposed solution is still not the optimal one: ValueError with message "I/O operation on closed handle" is really nothing I would expect as a valid error when downloading a non-existent web page. I guess a check on the source file object before reading begins would discover the problem early and raise more appropriate IOError or something like that. Note: This bug report probably applies to older versions of urllib, but I can't verify it now. I know at least I spotted it in 2.6 just before I upgraded to 3.0.1.
msg83978 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-03-22 17:45
Fixed in r70521.

History
Date	User	Action	Args
2022-04-11 14:56:46	admin	set	github: 49786
2009-03-22 17:45:19	benjamin.peterson	set	status: open -> closed nosy: + benjamin.peterson messages: + msg83978 resolution: fixed
2009-03-22 12:28:54	petr.dolezal	create