This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse.parse_qsl does not handle unicode data properly
Type: behavior Stage: test needed
Components: Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: cheryl.sabella, cyrkov, maxking
Priority: normal Keywords:

Created on 2017-05-26 09:49 by maxking, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg294541 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2017-05-26 09:49
After decoding percentage encoded `name` and `values` in the query string, it tries to `_coerce_result` or encode the result to ascii (which is the value of _implicit_encoding).

```
  File "/usr/lib/python3.6/urllib/parse.py", line 691, in parse_qsl
    value = _coerce_result(value)
  File "/usr/lib/python3.6/urllib/parse.py", line 95, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)
```

As seen in the partial traceback above, it breaks things when trying to parse unicode encode query string values.
msg318050 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-05-29 15:56
Would you be able to include an example for recreating this?

Looking at the code, it uses the ascii encoding for bytes (which can only contain ASCII literal characters) and should not be using that encoding for strings.

Thanks!
msg366592 - (view) Author: Dmitry Tsirkov (cyrkov) Date: 2020-04-16 10:52
I have recently stumbled upon this bug, and I can present the example and a solution I've used.
The issue happens when we try to parse x-www-form-urlencoded of type bytes:
```
>>> from urllib.parse import urlencode, parse_qs
>>> urlencode([('v', 'ö')])
'v=%C3%B6'
>>> parse_qs('v=%C3%B6')
{'v': ['ö']}
>>> parse_qs(b'v=%C3%B6')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/urllib/parse.py", line 669, in parse_qs
    encoding=encoding, errors=errors)
  File "/usr/lib64/python3.6/urllib/parse.py", line 722, in parse_qsl
    value = _coerce_result(value)
  File "/usr/lib64/python3.6/urllib/parse.py", line 103, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 0: ordinal not in range(128)
```
This happens in the parse_qsl function because _coerce_result is a synonym of _encode_result and is called with default parameter encoding='ascii'. As far as I understand, it should be called with the encoding parameter of the parse_qsl function:
```
742c742
<             name = _coerce_result(name)
---
>             name = _coerce_result(name, encoding=encoding, errors=errors)
745c745
<             value = _coerce_result(value)
---
>             value = _coerce_result(value, encoding=encoding, errors=errors)
```
I am not sure whether I should commit this to the repo and create a pull request, as described in the devguide.
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74668
2020-04-16 10:52:01cyrkovsetnosy: + cyrkov
messages: + msg366592
2018-05-29 15:56:39cheryl.sabellasetversions: + Python 3.7, Python 3.8, - Python 3.5
nosy: + cheryl.sabella

messages: + msg318050

type: behavior
stage: test needed
2018-05-29 12:32:20cheryl.sabellalinkissue32131 superseder
2017-05-26 09:49:46maxkingcreate