Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request.full_url: unexpected results on assignment #61474

Closed
demianbrecht mannequin opened this issue Feb 22, 2013 · 23 comments
Closed

request.full_url: unexpected results on assignment #61474

demianbrecht mannequin opened this issue Feb 22, 2013 · 23 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@demianbrecht
Copy link
Mannequin

demianbrecht mannequin commented Feb 22, 2013

BPO 17272
Nosy @terryjreedy, @orsenthil, @ezio-melotti, @bitdancer, @demianbrecht
Files
  • request.patch
  • request_17272.1.patch
  • request_17272.1.reusable.patch: apply first
  • request_17272.2.full_url_w_fragment.patch: apply second
  • 17272-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2013-05-24.16:19:04.597>
    created_at = <Date 2013-02-22.05:56:42.857>
    labels = ['type-feature', 'library']
    title = 'request.full_url: unexpected results on assignment'
    updated_at = <Date 2014-03-10.22:11:24.984>
    user = 'https://github.com/demianbrecht'

    bugs.python.org fields:

    activity = <Date 2014-03-10.22:11:24.984>
    actor = 'python-dev'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2013-05-24.16:19:04.597>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2013-02-22.05:56:42.857>
    creator = 'demian.brecht'
    dependencies = []
    files = ['29156', '29589', '29994', '29995', '30340']
    hgrepos = []
    issue_num = 17272
    keywords = ['patch']
    message_count = 23.0
    messages = ['182642', '182643', '182664', '182670', '182671', '182674', '182696', '184456', '184461', '184788', '184789', '185304', '185309', '187420', '187654', '187675', '187779', '187786', '188134', '189815', '189915', '189918', '213101']
    nosy_count = 6.0
    nosy_names = ['terry.reedy', 'orsenthil', 'ezio.melotti', 'r.david.murray', 'python-dev', 'demian.brecht']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17272'
    versions = ['Python 3.4']

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Feb 22, 2013

    When assigning a value to an already instantiated Request object's full_url, unexpected results are found as a consequence in the attributes selector, type and fragment. Selector, type and fragment are only assigned to during instantiation. Unless you know to call Request._parse() after assignment to Request.full_url, the other attributes will not be reassigned new values.

    The attached patch changes full_url to be a @Property, essentially moving what was going on in the constructor into the property method(s). All tests pass.

    I ran into this issue when working on an OAuth 2.0 client library when attempting to mutate an already instantiated Request object, injecting the access_token into the query params.

    Sandboxed code to repro the issue below:

    In [1]: from urllib.request import Request
    In [2]: r = Request('https://example.com?foo=bar#baz')

    # as expected
    In [4]: r.__dict__
    Out[4]:
    {'_data': None,
    '_tunnel_host': None,
    'fragment': 'baz',
    'full_url': 'https://example.com?foo=bar',
    'headers': {},
    'host': 'example.com',
    'method': None,
    'origin_req_host': 'example.com',
    'selector': '/?foo=bar',
    'type': 'https',
    'unredirected_hdrs': {},
    'unverifiable': False}

    In [5]: r.full_url = 'https://example.com?foo=bar&access_token=token_value#baz'

    # unexpected results in selector
    In [6]: r.__dict__
    Out[6]:
    {'_data': None,
    '_tunnel_host': None,
    'fragment': 'baz',
    'full_url': 'https://example.com?foo=bar&access_token=token_value#baz',
    'headers': {},
    'host': 'example.com',
    'method': None,
    'origin_req_host': 'example.com',
    'selector': '/?foo=bar',
    'type': 'https',
    'unredirected_hdrs': {},
    'unverifiable': False}

    In [7]: Request('https://example.com?foo=bar&access_token=token_value#baz')
    Out[7]: <urllib.request.Request at 0x10ef6ce90>

    # these results should match that of the full_url setter
    In [8]: Request('https://example.com?foo=bar&access_token=token_value#baz').\_\_dict__
    Out[8]:
    {'_data': None,
    '_tunnel_host': None,
    'fragment': 'baz',
    'full_url': 'https://example.com?foo=bar&access_token=token_value',
    'headers': {},
    'host': 'example.com',
    'method': None,
    'origin_req_host': 'example.com',
    'selector': '/?foo=bar&access_token=token_value',
    'type': 'https',
    'unredirected_hdrs': {},
    'unverifiable': False}

    @demianbrecht demianbrecht mannequin added the stdlib Python modules in the Lib dir label Feb 22, 2013
    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Feb 22, 2013

    I also meant to mention that of course, the obvious workaround would simply be to instantiate a new Request object with the new URL, but in my mind, this is something that should likely be supported by the Request object itself.

    @bitdancer
    Copy link
    Member

    Well, full_url was not designed to be assignable, and is documented that way (the docs say it is the URL passed to the constructor). Whether or not it is reasonable to make it assignable is an interesting question.

    @bitdancer bitdancer added the type-feature A feature request or enhancement label Feb 22, 2013
    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Feb 22, 2013

    It could entirely be my lack of being Dutch that didn't make this obvious without reading the docs ;)

    I guess there's one of three ways that this could go:

    To help clarify the intention through code, either

    a) Change full_url to _full_url, indicating that this should be treated as a private member and only expose the url through get_full_url(). This will obviously have the negative side effect of breaking backwards compatibility for anyone using full_url directly.

    b) Keep the property implementation in the patch, but replace the setter with a read-only exception.

    And the third option is what's in this patch (there are likely other options that I'm just not seeing at the moment as well).

    Having said all that, if the mutability of full_url is made explicit, then safeguards should also be put in place for the rest of the attributes.

    I couldn't think of any hard reason as to why the state of a Request instance /shouldn't/ be mutable and the user should be required to instantiate a new Request in order to use a new URL.

    @bitdancer
    Copy link
    Member

    One of those other options is: do nothing :)

    Python doesn't normally make things read-only just because mutating them does nothing useful. Sometimes we make things read-only when mutating them does nasty stuff, but even then sometimes we don't.

    So the real question is, do others consider it a sensible and useful API change to make setting it do something useful, and how many other changes would be required to make the rest of the API consistent with that change?

    We may be in python-ideas territory here, I'm not sure.

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Feb 22, 2013

    Yes, I realized that I had forgotten to add the "do nothing" option after posting but figured it was relatively obvious :)

    "Python doesn't normally make things read-only just because mutating them does nothing useful. Sometimes we make things read-only when mutating them does nasty stuff, but even then sometimes we don't."

    I realize that this is a higher level question and outside of the scope of this particular issue (and likely belonging more in python-ideas), but doesn't this somewhat contradict "There should be one-- and preferably only one --obvious way to do it."? I'd assume that this should not only apply to sandboxed object implementations, but also at a higher level across /all/ objects. From your post, I assume inconsistency between modules, which would imply non-obvious ways to "do something" when moving from one module (or object) implementation to the next.

    I'm definitely interested to hear whether or not others would find this change useful as I do. There /should/ be no other changes required for consistency as no other attributes of the Request class that don't already implement assignment methods (i.e. "data") are affected by side effects within __init__ (or anywhere else).

    @terryjreedy
    Copy link
    Member

    I believe the issue of reusing request objects after modification has come up before, either on the tracker (but I cannot find an issue) or elsewhere. Senthil may remember better and certainly may have an opinion. I agree that python-idea would be a better place for other opinions.

    @orsenthil
    Copy link
    Member

    Sorry for taking long time to respond. full_url has been in the shape it is for many versions, changing it in backwards incompatible way be do more harm than good.

    • I would be really interested to know why the present form of full_url presented any limitation for developing an client? Apart from get_full_url there are other ways to get full_url and url associated. Any details on "Why this is must"? The explain snippet below gives the assumption on full_url setting, I could not get need for change from this. Personally, I am 0 to -1 on this too, as in I cannot see a clear need for this change.

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Mar 18, 2013

    No worries.

    The change is not backwards incompatible. test_urllib2 test pass without any modification (I'm getting a segfault in test_heapq atm so can't run the full suite). I've simply moved the side effects cause by __init__ to a setter so that full_url may be set after instantiation and will still incur the same results as after initial creation.

    The biggest problem with this particular attribute and the way that it's currently handled is inconsistent with the rest of the class. The only other attribute that incurs side effects when set (data) is implemented with @Property getter and setters. When set, it incurs side effects on the headers (removing Content-Length). Unless I'm missing something, other attributes are directly mutable and do not incur side effects on instance state when set.

    In my mind, if full_url is publicly accessible, then it should be publicly accessible /and/ settable. It currently is not without causing invalid state within a given Request instance.

    @bitdancer
    Copy link
    Member

    Having looked at the current handling of the data attribute in the context of another issue, I am now inclined to agree with you that full_url should be updated in order to have the API have a consistent behavior. Although it isn't backward incompatible API wise, it is possible for existing code to depend on the other values *not* being recomputed (presumably unintentionally, since that would be a rather odd thing to do :), so I still think the change should only be made in 3.4. I'm open to argument on that, though.

    @bitdancer
    Copy link
    Member

    Oh, data being a property is 3.4 only. So if consistency is the goal, this should definately be a 3.4 only change.

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Mar 26, 2013

    Based on your and Andrew's comment in issue bpo-16464 (new behaviour so it should only apply to 3.4), I agree that this should be a 3.4-only change.

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Mar 26, 2013

    Patch updated (acks update, fixed order) per Terry's instructions on core-mentorship.

    @bitdancer
    Copy link
    Member

    Thanks for working on this, Demian. I made some review comments, mostly style things about the tests.

    There's one substantial comment about the change in behaivor of the full_url property though (before patch it does not include the fragment, after the patch it does). We need to think about the implications of that change in terms of backward compatibility. It makes more sense, but how likely is it to break working code?

    @bitdancer
    Copy link
    Member

    Senthil, are you saying (in the review) that we should treat the current return value of full_url as a bug, and fix it in maintenance releases? It is certainly true that its value does not match the documentation.

    Ah. I see that get_full_url used to have the same bug before you fixed it in 3.1. So I guess I agree with you :)

    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Apr 23, 2013

    As suggested by Senthil, I've broken this up into two patches: One that implements this reusable Request (this one) and one that implements the new (consistent) behaviour, in having full_url return the fragment.

    I will also add a bug/patch requesting the deprecation of get_full_url as it will simply be a passthrough to full_url.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2013

    New changeset 2d4189e9bbe8 by Senthil Kumaran in branch 'default':
    Issue bpo-17272: Making the urllib.request's Request.full_url a descriptor. Fixes
    http://hg.python.org/cpython/rev/2d4189e9bbe8

    @orsenthil
    Copy link
    Member

    I have committed the first patch which makes Request.full_url a descriptor.
    As I was looking at the changes to be introduced by second patch, I noticed that we do not have comprehensive test coverage for .full_url public attribute. All the tests are testing the methods like get_full_url.

    The change for .full_url not to include fragments was introduced in 63817:bf3359b7ed2e which says that for reasons that HTTP request should not include fragments, it was done. That's correct. Now, I would fear to introduce that bug again with the second patch wherein we inadvertently send a URL with fragment in a HTTP request. To ensure this will not be the case, I think, increase in test coverage, understanding and documenting the exact expectation will be necessary if we have to change Request.full_url behavior. I will be spending a little more time on it. I thought I will write down the points which should be taken care.

    @orsenthil orsenthil self-assigned this Apr 25, 2013
    @demianbrecht
    Copy link
    Mannequin Author

    demianbrecht mannequin commented Apr 30, 2013

    bpo-8280 reports the error by way of urlopen(). In light of that, would it not make more sense to have the *Opener be responsible for determining which parts of the url should be used?

    i.e. request.py, line ~1255:

    r.url = req.get_full_url()

    might instead be

    r.url = req.full_url.split('#')[0]

    To me, it would make more sense to have the client code (in this case, meaning the consumer of the Request object, which is the *Opener) be smart enough to know what parts of the url should be used in the HTTP request than to have the Request object have inconsistencies in the set and subsequent get values for full_url.

    I wouldn't have an issue at all with adding a new patch that (on top of implementing full_url):

    1. Adds identical tests to get_full_url for full_url
    2. Changes the client code in *Opener (and anywhere else known to exhibit the same behaviour) to strip the URL fragment for the HTTP request

    I don't believe that this change would be any more of a backwards compatibility risk than what's currently in the patch as the logic in terms of urlopen is kept. The risk of dependencies on fragment-less full_urls however, would remain.

    @orsenthil
    Copy link
    Member

    Here is patch with tests and docs. I see no changes to opener is required and the selector which is sent to HTTP request is the correct one. I have added tests for redirect url with #fragment too (For testing scenario reported in bpo-8280).

    I shall close this issue once I commit this patch. In a separate report/ commit, get_full_url should be deprecated. Also, I like to see splittag being replaced with urlparse itself, so that our test and expectation with fragment will be consistent.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 24, 2013

    New changeset 51c5870144e7 by Senthil Kumaran in branch 'default':
    Fix bpo-17272 - Make Request.full_url and Request.get_full_url return same result under all circumstances.
    http://hg.python.org/cpython/rev/51c5870144e7

    @orsenthil
    Copy link
    Member

    This is fixed in 3.4. I dont think backporting is a good idea just to support assignment to .full_url. Thinking of this further, the idea of reusing request by assigning to .full_url may not be a good idea, because if you set .full_url to a completely different URL from different host, the request becomes meaningless.

    Closing this issue as the enhancement is done.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 10, 2014

    New changeset e6d862886e5c by R David Murray in branch 'default':
    whatsnew: urllib Request objects are now reusable.
    http://hg.python.org/cpython/rev/e6d862886e5c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants