New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing enumeration of HTTPResponse Objects methods of urllib.request.urlopen's http.client.HTTPResponse? #65427
Comments
In the Python Library documentation, in section "21.6. urllib.request — Extensible library for opening URLs", in the description of the urllib.request.urlopen() function it is writen: --------- […] For ftp, file, and data urls and requests explicity handled by legacy […] --------- The first sentence seemed to imply that something is supposed to be specified if I understand correctly. Is it me missing something ? |
I forgot to tell that it is in section "20.5. urllib.request — Extensible library for opening URLs" for Python 3.2.5 |
Hello Evens, |
I don't quite understand what you are asking me. You need a copy of the document ? You can find an example at this link: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen |
That section of the docs is indeed rather confusing. Probably it just needs to be changed to say "for the methods supported by this object, see HTTPResponse Objects. I'd like to see the docs reorganized so that the '.. class' declaration in the docs is immediately followed by the class's methods, but that's a much bigger project. |
I interpreted it more along the lines of “. . . returns a http.client.HTTPResponse object [with] the following [additional] methods.” Indeed, a HTTP urlopen() response I just tried does have info(), geturl() and getcode() methods, and I know the info() method is used in the real world. Also, it would be good to document that the HTTP response’s “msg” attribute does not actually hold the header, despite the HTTPResponse documentation. Further, the return value of BaseHandler.default_open() is defined to be the same as urlopen(), but when a HTTP error occurs I have found the “msg” attribute is meant to be the HTTP status text phrase (e.g. “Not Found”). Perhaps it would be good to add something like these two points wherever they belong:
|
Hi, From my limited experience reporting documentation issues, I see that it's better to submit a patch than to only report an issue. So, I've tried to look into the source code to figure out what was going on. I have attached a patch that I'm submitting to you for review hoping I doing everything right. What was reported as ambiguous in this issue is the description of the return value of the function urllib.request.urlopen() for http and https URLs. It was mentionned that it should be an http.client.HTTPResponse object but it implied that something may have been different about this object. To understand why I'm may now be able to assert what's being said in that patch, follow me in the source code. It's based on revision c499cc2c4a06. If you don't care about all the walkthrough you can skip to step 9.
I hope this is what was needed to close this issue. Otherwise, just tell me what is missing. Oh and there seems that there are be many things that could be refactored. Can I do it and open issues about them ? |
With this patch, there is no longer any implication that the returned object implements the “addinfourl” interface. Perhaps that should be added back. Or maybe add it to the HTTPResponse class documentation itself? There is a comment that says the methods are there “for compatibility with old-style urllib responses”, although it seems to me they also make the class compatible with Python 3’s new “urllib”: http://hg.python.org/cpython/file/c499cc2c4a06/Lib/http/client.py#l772 It is good to document the “msg” attribute and its inconsistency, since I have found this is required to implement your own BaseHandler.default_open(). However I’m not so sure if it is necessary to document the “url” attribute. Why not encourage using geturl() instead, since it is already documented, at least for non-HTTP responses? I also saw a comment against the “msg” attribute which mentions deprecating something, but it is not clear what: |
Well, there wasn't any indication before that the returned object was implementing the "addinfourl" interface. So I don't think we have lost anything. In what situation this interface is useful ? The following comment (that you had highlighted in your comment) gives the impression that theses methods are there only to provide compatibility with clients using old-styles responses. http://hg.python.org/cpython/file/c499cc2c4a06/Lib/http/client.py#l772 That would imply that newer clients would usually not use these methods. If you want to document this, I think the "addinfourl" interface should then be better described somewhere else where it would include the fact that the HTTPResponse class implements it. Anyway, I don't see the advantage of using a getter method (like geturl()) instead of accessing directly the attribute. For me, this is less pythonic. If you ever have to attach a behaviour to the access of this attribute, a property could then be defined. |
Indeed, geturl is deprecated. I'm not sure where you see it documented, I don't see it. |
To be honest, it may be inspired by what's written a few lines lower, for ftp, files dans data urls even though the return object is not the same as the http(s) urls http://hg.python.org/cpython/file/c499cc2c4a06/Doc/library/urllib.request.rst#l75 |
Ah, I was looking at the http docs. I wonder if we just missed the urllib docs when we made the changes. Either that, or I'm misremembering things. |
By removing the “addinfourl” methods for HTTP responses, you are making it unnecessarily hard to handle header fields and metadata from the response. I do not know of any other documented way of getting the eventual redirect target, other than geturl(). And code to parse the Content-Type, which I understand is also returned for file: URLs, would start to get ugly (untested): info = getattr(response, "info", None)
if info:
type = info().get_content_type() # Easy!
else:
import cgi
type = cgi.parse_header(response.getheader("Content-Type))[0] Since a HTTP status code only seems to be returned for HTTP responses, deprecating or removing getcode() in favour of “HTTPResponse.status” might be okay, but I think info() and geturl() should stay. Maybe a “url” attribute is more Pythonic, but as far as I am aware that has never been documented for Python 2 or 3 for any URL type. I would not expect much existing code to be using it, and deprecating geturl() seems like a needless annoyance. |
I think I'm thinking of the Request API, and not the Response API. So ignore my comments about deprecation, I'm not sure what the status is. |
Look, the subject of this issue is to clarify the methods of the urllib.request.urlopen()'s return value for http(s) URLs. Nobody seemed to work on this for 4 months. That's why I tried to submit a patch after looking into the code to try to do my part to help. If you think that my patch is not clear enough and would need to be more precise or maybe even that it is plain wrong, please submit a new patch with your ideas so that we can close eventually this issue. I really didn't think that it would get this complicated… |
Fair enough, challenge accepted! Here is my attempt. I have explicitly made the info(), geturl() and getcode() methods available for all cases, and used Evens’s wording for the modified “msg” attribute, but dropped mentioning the “url” attribute. |
I'm satisfied with this new patch. |
Is there anything else that needs doing to get this committed and resolved? My patch is just a trivial reordering of Evens’s, with about one sentence deleted. |
Related: bpo-12707, about deprecating some methods in favour of attributes |
New changeset fa3c9faabfb0 by Martin Panter in branch '3.4': New changeset b55c006b79bc by Martin Panter in branch '3.5': New changeset c6930661599b by Martin Panter in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: