classification
Title: What is urllib.request.urlcleanup() function?
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: orsenthil, vstinner, xtreak
Priority: normal Keywords:

Created on 2019-07-01 18:01 by vstinner, last changed 2021-09-21 22:23 by vstinner. This issue is now closed.

Messages (5)
msg347056 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-01 18:01
While working on bpo-37421 "Some tests leak temporary files", I had to write PR 14529 fix: test_urllib urlretrieve() tests now explicitly call urlcleanup() to remove temporary files.

If urlretrieve() caller forgets to remove the temporary file, the "temporary" file is left in the temporary directory, until this directory is cleared which may happen soon (next reboot?) or not.

When ContentTooShortError is raised, the temporary file is left in the temporary directory: the caller must call urlcleanup() explicitly. It sounds like a bug to me.

Is it really a good API if urlcleanup() must be called explicitly? I dislike having a "separated" function for the cleanup, whereas Python could rely on object lifetime to do the cleanup.

Should we deprecate this API in favor of a better urllib API?
msg347067 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-07-01 19:27
I have added Senthil for thoughts. The temporary files is also stored in a module level variable. Looking into the git history there were some changes in e24f96a05973ddbb59d88c03570aef8545c5ef10 . The function is also marked as legacy in __all__ with compat code in 2to3. urlcleanup also sets the global variable _opener to None so it does the extra work of uninstalling the global variable opener set by install_opener which is also not documented.

urlretrieve enables retrieving and storing the content in a temporary file to return (tempfilename, headers) to be read later. In case the user doesn't give a filename it implicitly creates a temporary file. There is similar code in urllib.request.URLopener().retrieve [0] too where the temporary files are created implicitly but __del__ is overridden where the temp files are deleted as the program exits. I think it's better to ask the user to give filename and so that the user is responsible for the file but since the behavior is present for a long time there is backwards compatibility in changing this and there might be some code depending on the implicit temporary file created as a feature. 

One possible way would be to have a wrapper class that creates the temporary file when not given and then deletes it or calls urlcleanup on __del__ to clean it up as the GC is called like URLopener.retrieve? This would be done only when user doesn't give a file and for the temporary files generated by urlretrieve.


[0] https://github.com/python/cpython/blob/67310023f299b5a2fad71fca449b46d280036690/Lib/urllib/request.py#L1702
msg347081 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2019-07-01 22:50
Hi Victor and Karthikeyan,

Both your analysis are correct.

- This is a legacy interface, present purely for satisfying the old code, when urlretrieve was advertised as the first/easy to way to use urllib and download something.

- At this moment, I think, we should remove those legacy interfaces including urlcleanup. 

- I think, users can easily using urllib to write to the file using context manager and if desired urlretrieve can be modernized over it's use of urlcleanup.

+1 to deprecation and removal.
msg347128 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-02 10:44
> urlcleanup also sets the global variable _opener to None so it does the extra work of uninstalling the global variable opener set by install_opener which is also not documented.

Oh wait, urlopen() also sets this private _opener global variable and so has an effect on next calls to urlopen()... This API is really strange, I dislike it :-(

I modified my PR 14529 to call urlcleanup() is way more tests, but I also modified regrtest to test that urllib.requests._url_tempfiles and urllib.requests._opener are not modified.
msg347141 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-02 12:53
Another weirdness of test_urllib.py: it redefines urlopen() and actually tests its test function, rather than testing the urllib.request module.

https://github.com/python/cpython/pull/14529#issuecomment-507646868

"def urlopen(url, data=None, proxies=None):" in test_urllib.py is as old as the urllib module itself!

commit 1afc1696167547a5fa101c53e5a3ab4717f8852c
Author: Jeremy Hylton <jeremy@alum.mit.edu>
Date:   Wed Jun 18 20:49:58 2008 +0000

    Make a new urllib package .
    
    It consists of code from urllib, urllib2, urlparse, and robotparser.
    The old modules have all been removed.  The new package has five
    submodules: urllib.parse, urllib.request, urllib.response,
    urllib.error, and urllib.robotparser.  The urllib.request.urlopen()
    function uses the url opener from urllib2.
    
    Note that the unittests have not been renamed for the
    beta, but they will be renamed in the future.
    
    Joint work with Senthil Kumaran.

At least, the test function should have a different name, no?
History
Date User Action Args
2021-09-21 22:23:54vstinnersetstatus: open -> closed
resolution: out of date
stage: resolved
2019-07-02 12:53:14vstinnersetmessages: + msg347141
2019-07-02 10:44:12vstinnersetmessages: + msg347128
2019-07-01 22:50:33orsenthilsetmessages: + msg347081
2019-07-01 19:27:44xtreaksetmessages: + msg347067
2019-07-01 18:15:10xtreaksetnosy: + orsenthil
2019-07-01 18:01:48vstinnercreate