classification
Title: urllib.parse.unquote raises incorrect errormessage when string parameter is bytes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, nanjekyejoannah, ncoghlan, r.david.murray, stein-k, taleinat, xtreak
Priority: normal Keywords: newcomer friendly, patch

Created on 2018-01-05 19:15 by stein-k, last changed 2020-10-18 21:07 by taleinat. This issue is now closed.

Files
File name Uploaded Description Edit
urllib_parse_unquote_handle_bytes.patch stein-k, 2018-01-18 19:09 Patch adding support for bytes as input to unquote
urllib_parse_unquote_handle_bytes_test.patch stein-k, 2018-01-21 12:32 Added test for unquote with bytes input
Pull Requests
URL Status Linked Edit
PR 7768 merged stein-k, 2018-06-17 19:16
PR 22746 merged iritkatriel, 2020-10-18 14:48
Messages (19)
msg309517 - (view) Author: stein-k (stein-k) * Date: 2018-01-05 19:15
urllib.parse.unquote(b'abc%20def')
...
TypeError: a bytes-like object is required, not 'str'
msg309524 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-05 20:52
If you read the traceback the message is "correct" for some definition of correct: the right hand side controls the type of the expression, so it is objecting to trying to look for the string '%' in a bytes object.

There are probably ways this could be improved, but I'm not sure it is worth it, since this is just a general behavior of the 'in' operator.
msg309566 - (view) Author: stein-k (stein-k) * Date: 2018-01-06 18:39
The message is incorrect for the input to the function, and confusing because the caller did supply a bytes-like object.

The exception is from an implementation detail, and if the implementation changes the exception could as well. 

I suggest the implementation is changed to follow the example of  urllib.parse.unquote_to_bytes, which encodes its string parameter to bytes if it receives a str object.

Alternatively, a check for required type should be implemented and raise a TypeError with the correct error-message.
msg309625 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-07 15:59
We generally don't do type checking (see discussions of "duck typing"). We generally do just let the implementation detail bubble up.  I don't think the encoding suggestion works, since we can't know what encoding the byte string is in, or even if it is a valid one.

I wonder if unquote should be polymorphic: accept both bytes and strings and produce the same type of output as its input.  I think there are other urllib methods that are, but I haven't checked.
msg309633 - (view) Author: stein-k (stein-k) * Date: 2018-01-07 19:18
The unquote function does take an encoding argument, but I am more worried about unquote_to_to_bytes which just assumes utf-8. (But this might be an implementation detail of its intended usage.)

I must agree that a polymorphic implementation would be better. Do you think this tracker is the correct place for such a change, or would an enhancement proposal be more suited?
msg309641 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-07 22:37
I think Nick was the last one who touched the byte/string issues in urllib, so I've nosied him.  We'll see what he thinks (but this tracker accepts enhancement proposals as long as they are smallish ones).
msg309657 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-01-08 02:39
As David noted, we updated all the URL parsing functions to be polymorphic back in 3.2: https://docs.python.org/3/library/urllib.parse.html#parsing-ascii-encoded-bytes

We left the quoting functions alone, because they already had their own way of dealing with the bytes-vs-str distinction (quote_from_bytes, unquote_to_bytes, etc) that meant the polymorphic approach we adopted for the parsing functions didn't make sense.

That said, I think it would be reasonable to enhance unquote() to accept a bytes object, processing it as follows:

    unquote_to_bytes(string).decode(encoding, errors)
msg310252 - (view) Author: stein-k (stein-k) * Date: 2018-01-18 19:09
Added a patch containing the fix, is this the proper way or should I create a pull request?
msg310370 - (view) Author: stein-k (stein-k) * Date: 2018-01-21 12:32
Patch for tests.

Added test for calling unquote with bytes input and removed Exception raised in original test.
msg319814 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-06-17 13:28
@stein-k Thanks for the patch and tests. The patches apply cleanly on master branch. The project accepts pull requests and it will be helpful if you can make a PR for this on GitHub to get this merged. Since this is a bug fix I think it would require a News entry. Please find the below links for reference

PR workflow : https://devguide.python.org/pullrequest/
News entry reference : https://devguide.python.org/committing/?highlight=blurb#what-s-new-and-news-entries

Thanks
msg321286 - (view) Author: stein-k (stein-k) * Date: 2018-07-08 19:41
I have made the News Entry and created the Pull request as described here: https://devguide.python.org/pullrequest/. Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?
msg354468 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2019-10-11 17:03
> Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?

The status is changed after the patch is merged. The person that merges will usually change the status of the issue or If he/she forgets, anyone with developer role can update the status too.
msg354624 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-14 10:27
So urllib.parse.unquote() will accept bytes in addition to str starting with 3.9.

The unclear exception remains in prior versions. Would someone like to add a better exception for 3.7 and 3.8?

(I'm marking this as "newcomer friendly", referring only to improving the exception for 3.7 and 3.8.)
msg354625 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-14 10:36
New changeset aad2ee01561f260c69af1951c0d6fcaf75c4d41b by Tal Einat (Stein Karlsen) in branch 'master':
bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)
https://github.com/python/cpython/commit/aad2ee01561f260c69af1951c0d6fcaf75c4d41b
msg378784 - (view) Author: Irit Katriel (iritkatriel) * (Python triager) Date: 2020-10-16 23:07
Can this be closed? (It's intended as a 3.9-only change).
msg378858 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 10:26
Thanks for the reminder Irit!

No, this should not be closed yet, as there is still the possibility of improving the exception in version 3.8.
msg378870 - (view) Author: Irit Katriel (iritkatriel) * (Python triager) Date: 2020-10-18 14:49
Ah yes, I missed that. I've pushed a PR for 3.8 that does this:

>>> urllib.parse.unquote(b'abc%20def')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\src\cpython\lib\urllib\parse.py", line 635, in unquote
    raise TypeError('Expected str, got bytes')
TypeError: Expected str, got bytes
msg378898 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 21:06
New changeset 1a3f7c042a32fb813835243bd7f96e47c665bfdc by Irit Katriel in branch '3.8':
[3.8] bpo-32498: Improve exception message on passing bytes to urllib.parse.unquote (GH-22746)
https://github.com/python/cpython/commit/1a3f7c042a32fb813835243bd7f96e47c665bfdc
msg378899 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 21:07
Thanks for the report and the PR, Stein!

And thanks for the PR for 3.8, Irit!
History
Date User Action Args
2020-10-18 21:07:57taleinatsetstatus: open -> closed
resolution: fixed
messages: + msg378899

stage: patch review -> resolved
2020-10-18 21:06:41taleinatsetmessages: + msg378898
2020-10-18 14:49:41iritkatrielsetmessages: + msg378870
2020-10-18 14:48:14iritkatrielsetpull_requests: + pull_request21709
2020-10-18 10:26:11taleinatsetmessages: + msg378858
versions: + Python 3.8
2020-10-16 23:07:56iritkatrielsetnosy: + iritkatriel
messages: + msg378784
2019-10-14 10:36:33taleinatsetmessages: + msg354625
2019-10-14 10:27:18taleinatsetkeywords: + newcomer friendly
nosy: + taleinat
messages: + msg354624

2019-10-14 10:13:57taleinatsetversions: + Python 3.9, - Python 3.4, Python 3.5, Python 3.6, Python 3.7
2019-10-11 17:03:13nanjekyejoannahsetnosy: + nanjekyejoannah
messages: + msg354468
2018-07-08 19:41:07stein-ksetmessages: + msg321286
2018-06-17 19:16:00stein-ksetstage: patch review
pull_requests: + pull_request7376
2018-06-17 13:28:12xtreaksetnosy: + xtreak
messages: + msg319814
2018-06-17 13:03:59r.david.murraylinkissue33846 superseder
2018-01-21 12:32:20stein-ksetfiles: + urllib_parse_unquote_handle_bytes_test.patch

messages: + msg310370
2018-01-18 19:09:20stein-ksetfiles: + urllib_parse_unquote_handle_bytes.patch
keywords: + patch
messages: + msg310252
2018-01-08 02:39:19ncoghlansetmessages: + msg309657
2018-01-07 22:37:02r.david.murraysetnosy: + ncoghlan
messages: + msg309641
2018-01-07 19:18:36stein-ksetmessages: + msg309633
2018-01-07 15:59:32r.david.murraysetmessages: + msg309625
2018-01-06 18:39:07stein-ksetmessages: + msg309566
2018-01-05 20:52:50r.david.murraysetnosy: + r.david.murray
messages: + msg309524
2018-01-05 19:15:10stein-kcreate