Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is urllib.request.urlcleanup() function? #81656

Closed
vstinner opened this issue Jul 1, 2019 · 5 comments
Closed

What is urllib.request.urlcleanup() function? #81656

vstinner opened this issue Jul 1, 2019 · 5 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Jul 1, 2019

BPO 37475
Nosy @orsenthil, @vstinner, @tirkarthi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-09-21.22:23:54.226>
created_at = <Date 2019-07-01.18:01:48.323>
labels = ['library', '3.9']
title = 'What is urllib.request.urlcleanup() function?'
updated_at = <Date 2021-09-21.22:23:54.224>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-09-21.22:23:54.224>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2021-09-21.22:23:54.226>
closer = 'vstinner'
components = ['Library (Lib)']
creation = <Date 2019-07-01.18:01:48.323>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 37475
keywords = []
message_count = 5.0
messages = ['347056', '347067', '347081', '347128', '347141']
nosy_count = 3.0
nosy_names = ['orsenthil', 'vstinner', 'xtreak']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue37475'
versions = ['Python 3.9']

@vstinner
Copy link
Member Author

vstinner commented Jul 1, 2019

While working on bpo-37421 "Some tests leak temporary files", I had to write PR 14529 fix: test_urllib urlretrieve() tests now explicitly call urlcleanup() to remove temporary files.

If urlretrieve() caller forgets to remove the temporary file, the "temporary" file is left in the temporary directory, until this directory is cleared which may happen soon (next reboot?) or not.

When ContentTooShortError is raised, the temporary file is left in the temporary directory: the caller must call urlcleanup() explicitly. It sounds like a bug to me.

Is it really a good API if urlcleanup() must be called explicitly? I dislike having a "separated" function for the cleanup, whereas Python could rely on object lifetime to do the cleanup.

Should we deprecate this API in favor of a better urllib API?

@vstinner vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir labels Jul 1, 2019
@tirkarthi
Copy link
Member

I have added Senthil for thoughts. The temporary files is also stored in a module level variable. Looking into the git history there were some changes in e24f96a . The function is also marked as legacy in __all__ with compat code in 2to3. urlcleanup also sets the global variable _opener to None so it does the extra work of uninstalling the global variable opener set by install_opener which is also not documented.

urlretrieve enables retrieving and storing the content in a temporary file to return (tempfilename, headers) to be read later. In case the user doesn't give a filename it implicitly creates a temporary file. There is similar code in urllib.request.URLopener().retrieve [0] too where the temporary files are created implicitly but __del__ is overridden where the temp files are deleted as the program exits. I think it's better to ask the user to give filename and so that the user is responsible for the file but since the behavior is present for a long time there is backwards compatibility in changing this and there might be some code depending on the implicit temporary file created as a feature.

One possible way would be to have a wrapper class that creates the temporary file when not given and then deletes it or calls urlcleanup on __del__ to clean it up as the GC is called like URLopener.retrieve? This would be done only when user doesn't give a file and for the temporary files generated by urlretrieve.

[0]

def __del__(self):

@orsenthil
Copy link
Member

Hi Victor and Karthikeyan,

Both your analysis are correct.

  • This is a legacy interface, present purely for satisfying the old code, when urlretrieve was advertised as the first/easy to way to use urllib and download something.

  • At this moment, I think, we should remove those legacy interfaces including urlcleanup.

  • I think, users can easily using urllib to write to the file using context manager and if desired urlretrieve can be modernized over it's use of urlcleanup.

+1 to deprecation and removal.

@vstinner
Copy link
Member Author

vstinner commented Jul 2, 2019

urlcleanup also sets the global variable _opener to None so it does the extra work of uninstalling the global variable opener set by install_opener which is also not documented.

Oh wait, urlopen() also sets this private _opener global variable and so has an effect on next calls to urlopen()... This API is really strange, I dislike it :-(

I modified my PR 14529 to call urlcleanup() is way more tests, but I also modified regrtest to test that urllib.requests._url_tempfiles and urllib.requests._opener are not modified.

@vstinner
Copy link
Member Author

vstinner commented Jul 2, 2019

Another weirdness of test_urllib.py: it redefines urlopen() and actually tests its test function, rather than testing the urllib.request module.

#14529 (comment)

"def urlopen(url, data=None, proxies=None):" in test_urllib.py is as old as the urllib module itself!

commit 1afc169
Author: Jeremy Hylton <jeremy@alum.mit.edu>
Date: Wed Jun 18 20:49:58 2008 +0000

Make a new urllib package .

It consists of code from urllib, urllib2, urlparse, and robotparser.
The old modules have all been removed.  The new package has five
submodules: urllib.parse, urllib.request, urllib.response,
urllib.error, and urllib.robotparser.  The urllib.request.urlopen()
function uses the url opener from urllib2.

Note that the unittests have not been renamed for the
beta, but they will be renamed in the future.

Joint work with Senthil Kumaran.

At least, the test function should have a different name, no?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

3 participants