Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib2 doesn't escape spaces in http requests #57568

Open
sorcio mannequin opened this issue Nov 6, 2011 · 10 comments
Open

urllib2 doesn't escape spaces in http requests #57568

sorcio mannequin opened this issue Nov 6, 2011 · 10 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@sorcio
Copy link
Mannequin

sorcio mannequin commented Nov 6, 2011

BPO 13359
Nosy @orsenthil, @ezio-melotti, @karlcow, @sandrotosi, @mmaker, @sorcio, @vadmium
Superseder
  • bpo-14826: urlopen URL with unescaped space
  • Files
  • issue13359.patch: percent encoding of urls to fix the issue reported.
  • issue13359.patch
  • issue13359_py2.patch
  • urllib-request-space-encode.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-11-06.20:13:46.480>
    labels = ['type-bug', 'library']
    title = "urllib2 doesn't escape spaces in http requests"
    updated_at = <Date 2017-06-03.05:54:20.991>
    user = 'https://github.com/sorcio'

    bugs.python.org fields:

    activity = <Date 2017-06-03.05:54:20.991>
    actor = 'martin.panter'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2011-11-06.20:13:46.480>
    creator = 'davide.rizzo'
    dependencies = []
    files = ['23642', '24215', '24216', '30795']
    hgrepos = []
    issue_num = 13359
    keywords = ['patch']
    message_count = 10.0
    messages = ['147180', '147349', '149441', '151126', '151127', '151129', '151131', '183576', '192400', '295066']
    nosy_count = 11.0
    nosy_names = ['orsenthil', 'kiilerix', 'ezio.melotti', 'karlcow', 'sandro.tosi', 'maker', 'davide.rizzo', 'Ramchandra Apte', 'martin.panter', 'krisys', 'senko']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'test needed'
    status = 'open'
    superseder = '14826'
    type = 'behavior'
    url = 'https://bugs.python.org/issue13359'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @sorcio
    Copy link
    Mannequin Author

    sorcio mannequin commented Nov 6, 2011

    urllib2.urlopen('http://foo/url and spaces') will send a HTTP request line like this to the server:

    GET /url and spaces HTTP/1.1

    which the server obviously does not understand. This contrasts with urllib's behaviour which replaces the spaces (' ') in the url with '%20'.

    Related: bpo-918368 bpo-1153027

    @sorcio sorcio mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 6, 2011
    @krisys
    Copy link
    Mannequin

    krisys mannequin commented Nov 9, 2011

    I have used the quote method to percent encode the url for spaces and similar characters. This is my first patch. Please let me know if there is anything wrong. I will correct and re-submit it. I ran the test_urllib2.py which gave an OK for 34 tests.

    Changes are made in two instances:

    1. in the open method.
    2. in the __init__ of Request class to ensure that the same issue is addressed at the time of creating Request objects.

    @RamchandraApte
    Copy link
    Mannequin

    RamchandraApte mannequin commented Dec 14, 2011

    Seems good.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jan 12, 2012

    Patch attached for python3, with unit tests.

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Jan 12, 2012

    FWIW, I don't think it is a good idea to escape automatically. It will change the behaviour in a non-backward compatible way for existing applications that pass encoded urls to this function.

    I think the existing behaviour is better. The documentation and the failure mode for passing URLs with spaces could however be improved.

    @mmaker
    Copy link
    Mannequin

    mmaker mannequin commented Jan 12, 2012

    Here the patch for python2.

    kiilerix, RFC 1738 explicitly says that the space character shall not be used.

    @kiilerix
    Copy link
    Mannequin

    kiilerix mannequin commented Jan 12, 2012

    Yes, the url sent by urllib2 must not contain spaces. In my opinion the only way to handle that correctly is to not pass urls with spaces to urlopen. Escaping the urls is not a good solution - even if the API was to be designed from scratch. It would be better to raise an exception if it is passed an invalid url.

    Note for example that '/' and the %-encoding of '/' are different, and it must thus be possible to pass an url containing both to urlopen. That is not possible if it automically escapes.

    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Mar 6, 2013

    The issue with the current patch is that it is escaping more than only the spaces, with possibly indirect border effect.
    Anne van Kesteren is in the process of creating a parsing/writing specification for URL. Not finished but putting it here for future reference.
    http://url.spec.whatwg.org/

    @senko
    Copy link
    Mannequin

    senko mannequin commented Jul 6, 2013

    I vote for the parse method converting the spaces (and only the spaces) explicitly, for the following reasons:

    • the spaces must be encoded for the server to accept them
    • no user-encoded url will ever have spaces in them
    • space quoting is idempotent: quote(quote(' ')) == quote(' ')
    • if the user did get an exception from Request in case of invalid url containing the spaces, the only thing he or she can do is to quote the url string

    Here's a patch implementing this. The change allows for any whitespace character in the selector part of the url (and in particular, '\n'), not only ' '.

    @vadmium
    Copy link
    Member

    vadmium commented Jun 3, 2017

    I think this could be merged with bpo-14826. Maybe it is sensible to handle all control characters the same way.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant