Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.parse.unquote raises incorrect errormessage when string parameter is bytes #76679

Closed
stein-k mannequin opened this issue Jan 5, 2018 · 19 comments
Closed

urllib.parse.unquote raises incorrect errormessage when string parameter is bytes #76679

stein-k mannequin opened this issue Jan 5, 2018 · 19 comments
Labels
3.8 only security fixes 3.9 only security fixes easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@stein-k
Copy link
Mannequin

stein-k mannequin commented Jan 5, 2018

BPO 32498
Nosy @ncoghlan, @taleinat, @bitdancer, @stein-k, @tirkarthi, @nanjekyejoannah, @iritkatriel
PRs
  • bpo-32498: urllib.parse.unquote accepts bytes #7768
  • [3.8] bpo-32498: Improve exception message on passing bytes to urllib.parse.unquote #22746
  • Files
  • urllib_parse_unquote_handle_bytes.patch: Patch adding support for bytes as input to unquote
  • urllib_parse_unquote_handle_bytes_test.patch: Added test for unquote with bytes input
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-18.21:07:57.854>
    created_at = <Date 2018-01-05.19:15:10.350>
    labels = ['easy', '3.8', 'type-bug', 'library', '3.9']
    title = 'urllib.parse.unquote raises incorrect errormessage when string parameter is bytes'
    updated_at = <Date 2020-10-18.21:07:57.854>
    user = 'https://github.com/stein-k'

    bugs.python.org fields:

    activity = <Date 2020-10-18.21:07:57.854>
    actor = 'taleinat'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-18.21:07:57.854>
    closer = 'taleinat'
    components = ['Library (Lib)']
    creation = <Date 2018-01-05.19:15:10.350>
    creator = 'stein-k'
    dependencies = []
    files = ['47395', '47397']
    hgrepos = []
    issue_num = 32498
    keywords = ['patch', 'newcomer friendly']
    message_count = 19.0
    messages = ['309517', '309524', '309566', '309625', '309633', '309641', '309657', '310252', '310370', '319814', '321286', '354468', '354624', '354625', '378784', '378858', '378870', '378898', '378899']
    nosy_count = 7.0
    nosy_names = ['ncoghlan', 'taleinat', 'r.david.murray', 'stein-k', 'xtreak', 'nanjekyejoannah', 'iritkatriel']
    pr_nums = ['7768', '22746']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32498'
    versions = ['Python 3.8', 'Python 3.9']

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jan 5, 2018

    urllib.parse.unquote(b'abc%20def')
    ...
    TypeError: a bytes-like object is required, not 'str'

    @stein-k stein-k mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 5, 2018
    @bitdancer
    Copy link
    Member

    If you read the traceback the message is "correct" for some definition of correct: the right hand side controls the type of the expression, so it is objecting to trying to look for the string '%' in a bytes object.

    There are probably ways this could be improved, but I'm not sure it is worth it, since this is just a general behavior of the 'in' operator.

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jan 6, 2018

    The message is incorrect for the input to the function, and confusing because the caller did supply a bytes-like object.

    The exception is from an implementation detail, and if the implementation changes the exception could as well.

    I suggest the implementation is changed to follow the example of urllib.parse.unquote_to_bytes, which encodes its string parameter to bytes if it receives a str object.

    Alternatively, a check for required type should be implemented and raise a TypeError with the correct error-message.

    @bitdancer
    Copy link
    Member

    We generally don't do type checking (see discussions of "duck typing"). We generally do just let the implementation detail bubble up. I don't think the encoding suggestion works, since we can't know what encoding the byte string is in, or even if it is a valid one.

    I wonder if unquote should be polymorphic: accept both bytes and strings and produce the same type of output as its input. I think there are other urllib methods that are, but I haven't checked.

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jan 7, 2018

    The unquote function does take an encoding argument, but I am more worried about unquote_to_to_bytes which just assumes utf-8. (But this might be an implementation detail of its intended usage.)

    I must agree that a polymorphic implementation would be better. Do you think this tracker is the correct place for such a change, or would an enhancement proposal be more suited?

    @bitdancer
    Copy link
    Member

    I think Nick was the last one who touched the byte/string issues in urllib, so I've nosied him. We'll see what he thinks (but this tracker accepts enhancement proposals as long as they are smallish ones).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 8, 2018

    As David noted, we updated all the URL parsing functions to be polymorphic back in 3.2: https://docs.python.org/3/library/urllib.parse.html#parsing-ascii-encoded-bytes

    We left the quoting functions alone, because they already had their own way of dealing with the bytes-vs-str distinction (quote_from_bytes, unquote_to_bytes, etc) that meant the polymorphic approach we adopted for the parsing functions didn't make sense.

    That said, I think it would be reasonable to enhance unquote() to accept a bytes object, processing it as follows:

    unquote_to_bytes(string).decode(encoding, errors)
    

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jan 18, 2018

    Added a patch containing the fix, is this the proper way or should I create a pull request?

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jan 21, 2018

    Patch for tests.

    Added test for calling unquote with bytes input and removed Exception raised in original test.

    @tirkarthi
    Copy link
    Member

    @stein-k Thanks for the patch and tests. The patches apply cleanly on master branch. The project accepts pull requests and it will be helpful if you can make a PR for this on GitHub to get this merged. Since this is a bug fix I think it would require a News entry. Please find the below links for reference

    PR workflow : https://devguide.python.org/pullrequest/
    News entry reference : https://devguide.python.org/committing/?highlight=blurb#what-s-new-and-news-entries

    Thanks

    @stein-k
    Copy link
    Mannequin Author

    stein-k mannequin commented Jul 8, 2018

    I have made the News Entry and created the Pull request as described here: https://devguide.python.org/pullrequest/. Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?

    @nanjekyejoannah
    Copy link
    Member

    Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?

    The status is changed after the patch is merged. The person that merges will usually change the status of the issue or If he/she forgets, anyone with developer role can update the status too.

    @taleinat taleinat added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Oct 14, 2019
    @taleinat
    Copy link
    Contributor

    So urllib.parse.unquote() will accept bytes in addition to str starting with 3.9.

    The unclear exception remains in prior versions. Would someone like to add a better exception for 3.7 and 3.8?

    (I'm marking this as "newcomer friendly", referring only to improving the exception for 3.7 and 3.8.)

    @taleinat taleinat added the easy label Oct 14, 2019
    @taleinat
    Copy link
    Contributor

    New changeset aad2ee0 by Tal Einat (Stein Karlsen) in branch 'master':
    bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)
    aad2ee0

    @iritkatriel
    Copy link
    Member

    Can this be closed? (It's intended as a 3.9-only change).

    @taleinat
    Copy link
    Contributor

    Thanks for the reminder Irit!

    No, this should not be closed yet, as there is still the possibility of improving the exception in version 3.8.

    @taleinat taleinat added the 3.8 only security fixes label Oct 18, 2020
    @iritkatriel
    Copy link
    Member

    Ah yes, I missed that. I've pushed a PR for 3.8 that does this:

    >>> urllib.parse.unquote(b'abc%20def')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\User\src\cpython\lib\urllib\parse.py", line 635, in unquote
        raise TypeError('Expected str, got bytes')
    TypeError: Expected str, got bytes

    @taleinat
    Copy link
    Contributor

    New changeset 1a3f7c0 by Irit Katriel in branch '3.8':
    [3.8] bpo-32498: Improve exception message on passing bytes to urllib.parse.unquote (GH-22746)
    1a3f7c0

    @taleinat
    Copy link
    Contributor

    Thanks for the report and the PR, Stein!

    And thanks for the PR for 3.8, Irit!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants