classification
Title: request.full_url: unexpected results on assignment
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: demian.brecht, ezio.melotti, orsenthil, python-dev, r.david.murray, terry.reedy
Priority: normal Keywords: patch

Created on 2013-02-22 05:56 by demian.brecht, last changed 2014-03-10 22:11 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
request.patch demian.brecht, 2013-02-22 05:56 review
request_17272.1.patch demian.brecht, 2013-03-26 22:04 review
request_17272.1.reusable.patch demian.brecht, 2013-04-23 22:57 apply first review
request_17272.2.full_url_w_fragment.patch demian.brecht, 2013-04-23 22:58 apply second
17272-3.patch orsenthil, 2013-05-22 13:32 review
Messages (23)
msg182642 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-02-22 05:56
When assigning a value to an already instantiated Request object's full_url, unexpected results are found as a consequence in the attributes selector, type and fragment. Selector, type and fragment are only assigned to during instantiation. Unless you know to call Request._parse() after assignment to Request.full_url, the other attributes will not be reassigned new values.

The attached patch changes full_url to be a @property, essentially moving what was going on in the constructor into the property method(s). All tests pass.

I ran into this issue when working on an OAuth 2.0 client library when attempting to mutate an already instantiated Request object, injecting the access_token into the query params.

Sandboxed code to repro the issue below:

In [1]: from urllib.request import Request
In [2]: r = Request('https://example.com?foo=bar#baz')

# as expected
In [4]: r.__dict__
Out[4]: 
{'_data': None,
 '_tunnel_host': None,
 'fragment': 'baz',
 'full_url': 'https://example.com?foo=bar',
 'headers': {},
 'host': 'example.com',
 'method': None,
 'origin_req_host': 'example.com',
 'selector': '/?foo=bar',
 'type': 'https',
 'unredirected_hdrs': {},
 'unverifiable': False}

In [5]: r.full_url = 'https://example.com?foo=bar&access_token=token_value#baz'

# unexpected results in selector
In [6]: r.__dict__
Out[6]: 
{'_data': None,
 '_tunnel_host': None,
 'fragment': 'baz',
 'full_url': 'https://example.com?foo=bar&access_token=token_value#baz',
 'headers': {},
 'host': 'example.com',
 'method': None,
 'origin_req_host': 'example.com',
 'selector': '/?foo=bar',
 'type': 'https',
 'unredirected_hdrs': {},
 'unverifiable': False}

In [7]: Request('https://example.com?foo=bar&access_token=token_value#baz')
Out[7]: <urllib.request.Request at 0x10ef6ce90>

# these results should match that of the full_url setter
In [8]: Request('https://example.com?foo=bar&access_token=token_value#baz').__dict__
Out[8]: 
{'_data': None,
 '_tunnel_host': None,
 'fragment': 'baz',
 'full_url': 'https://example.com?foo=bar&access_token=token_value',
 'headers': {},
 'host': 'example.com',
 'method': None,
 'origin_req_host': 'example.com',
 'selector': '/?foo=bar&access_token=token_value',
 'type': 'https',
 'unredirected_hdrs': {},
 'unverifiable': False}
msg182643 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-02-22 05:59
I also meant to mention that of course, the obvious workaround would simply be to instantiate a new Request object with the new URL, but in my mind, this is something that should likely be supported by the Request object itself.
msg182664 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-22 13:33
Well, full_url was not designed to be assignable, and is documented that way (the docs say it is the URL passed to the constructor).  Whether or not it is reasonable to make it assignable is an interesting question.
msg182670 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-02-22 15:33
It could entirely be my lack of being Dutch that didn't make this obvious without reading the docs ;)

I guess there's one of three ways that this could go:

To help clarify the intention through code, either

a) Change full_url to _full_url, indicating that this should be treated as a private member and only expose the url through get_full_url(). This will obviously have the negative side effect of breaking backwards compatibility for anyone using full_url directly.

b) Keep the property implementation in the patch, but replace the setter with a read-only exception.

And the third option is what's in this patch (there are likely other options that I'm just not seeing at the moment as well).

Having said all that, if the mutability of full_url is made explicit, then safeguards should also be put in place for the rest of the attributes.


I couldn't think of any hard reason as to why the state of a Request instance /shouldn't/ be mutable and the user should be required to instantiate a new Request in order to use a new URL.
msg182671 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-02-22 16:19
One of those other options is: do nothing :)

Python doesn't normally make things read-only just because mutating them does nothing useful.  Sometimes we make things read-only when mutating them does nasty stuff, but even then sometimes we don't.

So the real question is, do others consider it a sensible and useful API change to make setting it do something useful, and how many other changes would be required to make the rest of the API consistent with that change?

We may be in python-ideas territory here, I'm not sure.
msg182674 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-02-22 17:10
Yes, I realized that I had forgotten to add the "do nothing" option after posting but figured it was relatively obvious :)

"Python doesn't normally make things read-only just because mutating them does nothing useful.  Sometimes we make things read-only when mutating them does nasty stuff, but even then sometimes we don't."

I realize that this is a higher level question and outside of the scope of this particular issue (and likely belonging more in python-ideas), but doesn't this somewhat contradict "There should be one-- and preferably only one --obvious way to do it."? I'd assume that this should not only apply to sandboxed object implementations, but also at a higher level across /all/ objects. From your post, I assume inconsistency between modules, which would imply non-obvious ways to "do something" when moving from one module (or object) implementation to the next.

I'm definitely interested to hear whether or not others would find this change useful as I do. There /should/ be no other changes required for consistency as no other attributes of the Request class that don't already implement assignment methods (i.e. "data") are affected by side effects within __init__ (or anywhere else).
msg182696 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-02-22 20:16
I believe the issue of reusing request objects after modification has come up before, either on the tracker (but I cannot find an issue) or elsewhere. Senthil may remember better and certainly may have an opinion. I agree that python-idea would be a better place for other opinions.
msg184456 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-03-18 14:25
Sorry for taking long time to respond. full_url has been in the shape it is for many versions, changing it in backwards incompatible way be do more harm than good.

- I would be really interested to know why the present form of full_url presented any limitation for developing an client? Apart from get_full_url there are other ways to get full_url and url associated. Any details on "Why this is must"? The explain snippet below gives the assumption on full_url setting, I could not get need for change from this. Personally, I am 0 to -1 on this too, as in I cannot see a clear need for this change.
msg184461 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-03-18 15:28
No worries.

The change is not backwards incompatible. test_urllib2 test pass without any modification (I'm getting a segfault in test_heapq atm so can't run the full suite). I've simply moved the side effects cause by __init__ to a setter so that full_url may be set after instantiation and will still incur the same results as after initial creation.

The biggest problem with this particular attribute and the way that it's currently handled is inconsistent with the rest of the class. The only other attribute that incurs side effects when set (data) is implemented with @property getter and setters. When set, it incurs side effects on the headers (removing Content-Length). Unless I'm missing something, other attributes are directly mutable and do not incur side effects on instance state when set.

In my mind, if full_url is publicly accessible, then it should be publicly accessible /and/ settable. It currently is not without causing invalid state within a given Request instance.
msg184788 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-20 20:33
Having looked at the current handling of the data attribute in the context of another issue, I am now inclined to agree with you that full_url should be updated in order to have the API have a consistent behavior.  Although it isn't backward incompatible API wise, it is possible for existing code to depend on the other values *not* being recomputed (presumably unintentionally, since that would be a rather odd thing to do :), so I still think the change should only be made in 3.4.  I'm open to argument on that, though.
msg184789 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-03-20 20:38
Oh, data being a property is 3.4 only.  So if consistency is the goal, this should definately be a 3.4 only change.
msg185304 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-03-26 21:15
Based on your and Andrew's comment in issue #16464 (new behaviour so it should only apply to 3.4), I agree that this should be a 3.4-only change.
msg185309 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-03-26 22:04
Patch updated (acks update, fixed order) per Terry's instructions on core-mentorship.
msg187420 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-20 12:41
Thanks for working on this, Demian.  I made some review comments, mostly style things about the tests.

There's one substantial comment about the change in behaivor of the full_url property though (before patch it does not include the fragment, after the patch it does).  We need to think about the implications of that change in terms of backward compatibility.  It makes more sense, but how likely is it to break working code?
msg187654 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-23 15:50
Senthil, are you saying (in the review) that we should treat the current return value of full_url as a bug, and fix it in maintenance releases?  It is certainly true that its value does not match the documentation.

Ah.  I see that get_full_url used to have the same bug before you fixed it in 3.1.  So I guess I agree with you :)
msg187675 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-04-23 22:57
As suggested by Senthil, I've broken this up into two patches: One that implements this reusable Request (this one) and one that implements the new (consistent) behaviour, in having full_url return the fragment.

I will also add a bug/patch requesting the deprecation of get_full_url as it will simply be a passthrough to full_url.
msg187779 - (view) Author: Roundup Robot (python-dev) Date: 2013-04-25 12:45
New changeset 2d4189e9bbe8 by Senthil Kumaran in branch 'default':
Issue #17272: Making the urllib.request's Request.full_url a descriptor.  Fixes
http://hg.python.org/cpython/rev/2d4189e9bbe8
msg187786 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-04-25 14:41
I have committed the first patch which makes Request.full_url a descriptor. 
As I was looking at the changes to be introduced by second patch, I noticed that we do not have comprehensive test coverage for .full_url public attribute. All the tests are testing the methods like get_full_url.

The change for .full_url not to include fragments was introduced in 63817:bf3359b7ed2e which says that for reasons that HTTP request should not include fragments, it was done. That's correct. Now, I would fear to introduce that bug again with the second patch wherein we inadvertently send a URL with fragment in a HTTP request.  To ensure this will not be the case, I think, increase in test coverage, understanding and documenting the exact expectation will be necessary if we have to change  Request.full_url behavior. I will be spending a little more time on it. I thought I will write down the points which should be taken care.
msg188134 - (view) Author: Demian Brecht (demian.brecht) * Date: 2013-04-30 07:15
Issue 8280 reports the error by way of urlopen(). In light of that, would it not make more sense to have the *Opener be responsible for determining which parts of the url should be used?

i.e. request.py, line ~1255:

r.url = req.get_full_url()

might instead be

r.url = req.full_url.split('#')[0]

To me, it would make more sense to have the client code (in this case, meaning the consumer of the Request object, which is the *Opener) be smart enough to know what parts of the url should be used in the HTTP request than to have the Request object have inconsistencies in the set and subsequent get values for full_url.

I wouldn't have an issue at all with adding a new patch that (on top of implementing full_url):

1. Adds identical tests to get_full_url for full_url
2. Changes the client code in *Opener (and anywhere else known to exhibit the same behaviour) to strip the URL fragment for the HTTP request

I don't believe that this change would be any more of a backwards compatibility risk than what's currently in the patch as the logic in terms of urlopen is kept. The risk of dependencies on fragment-less full_urls however, would remain.
msg189815 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-05-22 13:32
Here is patch with tests and docs. I see no changes to opener is required and the selector which is sent to HTTP request is the correct one. I have added tests for redirect url with #fragment too (For testing scenario reported in Issue 8280).

I shall close this issue once I commit this patch. In a separate report/ commit, get_full_url should be deprecated. Also, I like to see splittag being replaced with urlparse itself, so that our test and expectation with fragment will be consistent.
msg189915 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-24 16:14
New changeset 51c5870144e7 by Senthil Kumaran in branch 'default':
Fix #17272 - Make Request.full_url and Request.get_full_url return same result under all circumstances.
http://hg.python.org/cpython/rev/51c5870144e7
msg189918 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-05-24 16:19
This is fixed in 3.4. I dont think backporting is a good idea just to support assignment to .full_url. Thinking of this further, the idea of reusing request by assigning to .full_url may not be a good idea, because if you set .full_url to a completely different URL from different host, the request becomes meaningless.  

Closing this issue as the enhancement is done.
msg213101 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-10 22:11
New changeset e6d862886e5c by R David Murray in branch 'default':
whatsnew: urllib Request objects are now reusable.
http://hg.python.org/cpython/rev/e6d862886e5c
History
Date User Action Args
2014-03-10 22:11:24python-devsetmessages: + msg213101
2013-05-24 16:19:04orsenthilsetstatus: open -> closed
resolution: fixed
messages: + msg189918

stage: patch review -> resolved
2013-05-24 16:14:34python-devsetmessages: + msg189915
2013-05-22 13:32:21orsenthilsetfiles: + 17272-3.patch

messages: + msg189815
2013-04-30 07:15:33demian.brechtsetmessages: + msg188134
2013-04-25 14:41:48orsenthilsetassignee: orsenthil
messages: + msg187786
2013-04-25 12:45:58python-devsetnosy: + python-dev
messages: + msg187779
2013-04-23 22:58:03demian.brechtsetfiles: + request_17272.2.full_url_w_fragment.patch
2013-04-23 22:57:45demian.brechtsetfiles: + request_17272.1.reusable.patch

messages: + msg187675
2013-04-23 15:50:12r.david.murraysetmessages: + msg187654
2013-04-20 12:41:29r.david.murraysetmessages: + msg187420
2013-03-26 22:04:29demian.brechtsetfiles: + request_17272.1.patch

messages: + msg185309
2013-03-26 21:15:13demian.brechtsetmessages: + msg185304
2013-03-20 20:38:02r.david.murraysetmessages: + msg184789
2013-03-20 20:33:11r.david.murraysetmessages: + msg184788
2013-03-18 15:28:17demian.brechtsetmessages: + msg184461
2013-03-18 14:25:55orsenthilsetmessages: + msg184456
2013-02-22 23:05:38ezio.melottisetnosy: + ezio.melotti
2013-02-22 20:16:37terry.reedysetnosy: + terry.reedy, orsenthil

messages: + msg182696
stage: patch review
2013-02-22 17:10:26demian.brechtsetmessages: + msg182674
2013-02-22 16:19:31r.david.murraysetmessages: + msg182671
2013-02-22 15:33:18demian.brechtsetmessages: + msg182670
2013-02-22 13:33:14r.david.murraysettype: enhancement

messages: + msg182664
nosy: + r.david.murray
2013-02-22 05:59:01demian.brechtsetmessages: + msg182643
2013-02-22 05:56:42demian.brechtcreate