msg309517 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-01-05 19:15 |
urllib.parse.unquote(b'abc%20def')
...
TypeError: a bytes-like object is required, not 'str'
|
msg309524 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-01-05 20:52 |
If you read the traceback the message is "correct" for some definition of correct: the right hand side controls the type of the expression, so it is objecting to trying to look for the string '%' in a bytes object.
There are probably ways this could be improved, but I'm not sure it is worth it, since this is just a general behavior of the 'in' operator.
|
msg309566 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-01-06 18:39 |
The message is incorrect for the input to the function, and confusing because the caller did supply a bytes-like object.
The exception is from an implementation detail, and if the implementation changes the exception could as well.
I suggest the implementation is changed to follow the example of urllib.parse.unquote_to_bytes, which encodes its string parameter to bytes if it receives a str object.
Alternatively, a check for required type should be implemented and raise a TypeError with the correct error-message.
|
msg309625 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-01-07 15:59 |
We generally don't do type checking (see discussions of "duck typing"). We generally do just let the implementation detail bubble up. I don't think the encoding suggestion works, since we can't know what encoding the byte string is in, or even if it is a valid one.
I wonder if unquote should be polymorphic: accept both bytes and strings and produce the same type of output as its input. I think there are other urllib methods that are, but I haven't checked.
|
msg309633 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-01-07 19:18 |
The unquote function does take an encoding argument, but I am more worried about unquote_to_to_bytes which just assumes utf-8. (But this might be an implementation detail of its intended usage.)
I must agree that a polymorphic implementation would be better. Do you think this tracker is the correct place for such a change, or would an enhancement proposal be more suited?
|
msg309641 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2018-01-07 22:37 |
I think Nick was the last one who touched the byte/string issues in urllib, so I've nosied him. We'll see what he thinks (but this tracker accepts enhancement proposals as long as they are smallish ones).
|
msg309657 - (view) |
Author: Nick Coghlan (ncoghlan) *  |
Date: 2018-01-08 02:39 |
As David noted, we updated all the URL parsing functions to be polymorphic back in 3.2: https://docs.python.org/3/library/urllib.parse.html#parsing-ascii-encoded-bytes
We left the quoting functions alone, because they already had their own way of dealing with the bytes-vs-str distinction (quote_from_bytes, unquote_to_bytes, etc) that meant the polymorphic approach we adopted for the parsing functions didn't make sense.
That said, I think it would be reasonable to enhance unquote() to accept a bytes object, processing it as follows:
unquote_to_bytes(string).decode(encoding, errors)
|
msg310252 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-01-18 19:09 |
Added a patch containing the fix, is this the proper way or should I create a pull request?
|
msg310370 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-01-21 12:32 |
Patch for tests.
Added test for calling unquote with bytes input and removed Exception raised in original test.
|
msg319814 - (view) |
Author: Karthikeyan Singaravelan (xtreak) *  |
Date: 2018-06-17 13:28 |
@stein-k Thanks for the patch and tests. The patches apply cleanly on master branch. The project accepts pull requests and it will be helpful if you can make a PR for this on GitHub to get this merged. Since this is a bug fix I think it would require a News entry. Please find the below links for reference
PR workflow : https://devguide.python.org/pullrequest/
News entry reference : https://devguide.python.org/committing/?highlight=blurb#what-s-new-and-news-entries
Thanks
|
msg321286 - (view) |
Author: stein-k (stein-k) * |
Date: 2018-07-08 19:41 |
I have made the News Entry and created the Pull request as described here: https://devguide.python.org/pullrequest/. Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?
|
msg354468 - (view) |
Author: Joannah Nanjekye (nanjekyejoannah) *  |
Date: 2019-10-11 17:03 |
> Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?
The status is changed after the patch is merged. The person that merges will usually change the status of the issue or If he/she forgets, anyone with developer role can update the status too.
|
msg354624 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2019-10-14 10:27 |
So urllib.parse.unquote() will accept bytes in addition to str starting with 3.9.
The unclear exception remains in prior versions. Would someone like to add a better exception for 3.7 and 3.8?
(I'm marking this as "newcomer friendly", referring only to improving the exception for 3.7 and 3.8.)
|
msg354625 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2019-10-14 10:36 |
New changeset aad2ee01561f260c69af1951c0d6fcaf75c4d41b by Tal Einat (Stein Karlsen) in branch 'master':
bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)
https://github.com/python/cpython/commit/aad2ee01561f260c69af1951c0d6fcaf75c4d41b
|
msg378784 - (view) |
Author: Irit Katriel (iritkatriel) *  |
Date: 2020-10-16 23:07 |
Can this be closed? (It's intended as a 3.9-only change).
|
msg378858 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2020-10-18 10:26 |
Thanks for the reminder Irit!
No, this should not be closed yet, as there is still the possibility of improving the exception in version 3.8.
|
msg378870 - (view) |
Author: Irit Katriel (iritkatriel) *  |
Date: 2020-10-18 14:49 |
Ah yes, I missed that. I've pushed a PR for 3.8 that does this:
>>> urllib.parse.unquote(b'abc%20def')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\src\cpython\lib\urllib\parse.py", line 635, in unquote
raise TypeError('Expected str, got bytes')
TypeError: Expected str, got bytes
|
msg378898 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2020-10-18 21:06 |
New changeset 1a3f7c042a32fb813835243bd7f96e47c665bfdc by Irit Katriel in branch '3.8':
[3.8] bpo-32498: Improve exception message on passing bytes to urllib.parse.unquote (GH-22746)
https://github.com/python/cpython/commit/1a3f7c042a32fb813835243bd7f96e47c665bfdc
|
msg378899 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2020-10-18 21:07 |
Thanks for the report and the PR, Stein!
And thanks for the PR for 3.8, Irit!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:56 | admin | set | github: 76679 |
2020-10-18 21:07:57 | taleinat | set | status: open -> closed resolution: fixed messages:
+ msg378899
stage: patch review -> resolved |
2020-10-18 21:06:41 | taleinat | set | messages:
+ msg378898 |
2020-10-18 14:49:41 | iritkatriel | set | messages:
+ msg378870 |
2020-10-18 14:48:14 | iritkatriel | set | pull_requests:
+ pull_request21709 |
2020-10-18 10:26:11 | taleinat | set | messages:
+ msg378858 versions:
+ Python 3.8 |
2020-10-16 23:07:56 | iritkatriel | set | nosy:
+ iritkatriel messages:
+ msg378784
|
2019-10-14 10:36:33 | taleinat | set | messages:
+ msg354625 |
2019-10-14 10:27:18 | taleinat | set | keywords:
+ newcomer friendly nosy:
+ taleinat messages:
+ msg354624
|
2019-10-14 10:13:57 | taleinat | set | versions:
+ Python 3.9, - Python 3.4, Python 3.5, Python 3.6, Python 3.7 |
2019-10-11 17:03:13 | nanjekyejoannah | set | nosy:
+ nanjekyejoannah messages:
+ msg354468
|
2018-07-08 19:41:07 | stein-k | set | messages:
+ msg321286 |
2018-06-17 19:16:00 | stein-k | set | stage: patch review pull_requests:
+ pull_request7376 |
2018-06-17 13:28:12 | xtreak | set | nosy:
+ xtreak messages:
+ msg319814
|
2018-06-17 13:03:59 | r.david.murray | link | issue33846 superseder |
2018-01-21 12:32:20 | stein-k | set | files:
+ urllib_parse_unquote_handle_bytes_test.patch
messages:
+ msg310370 |
2018-01-18 19:09:20 | stein-k | set | files:
+ urllib_parse_unquote_handle_bytes.patch keywords:
+ patch messages:
+ msg310252
|
2018-01-08 02:39:19 | ncoghlan | set | messages:
+ msg309657 |
2018-01-07 22:37:02 | r.david.murray | set | nosy:
+ ncoghlan messages:
+ msg309641
|
2018-01-07 19:18:36 | stein-k | set | messages:
+ msg309633 |
2018-01-07 15:59:32 | r.david.murray | set | messages:
+ msg309625 |
2018-01-06 18:39:07 | stein-k | set | messages:
+ msg309566 |
2018-01-05 20:52:50 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg309524
|
2018-01-05 19:15:10 | stein-k | create | |