This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, doerwalter, eamanu, ezio.melotti, lemburg, pitrou, serhiy.storchaka, thatiparthy, utk
Priority: normal Keywords: easy, patch

Created on 2020-06-25 12:49 by pitrou, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 21165 open thatiparthy, 2020-06-26 06:00
PR 21170 closed utk, 2020-06-26 16:14
Messages (6)
msg372367 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020-06-25 12:49
A number of codecs raise bare UnicodeError, rather than Unicode{Decode,Encode}Error. Example:

  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.7/encodings/utf_16.py", line 67, in _buffer_decode
    raise UnicodeError("UTF-16 stream does not start with BOM")

A more complete list can be found here:
https://gist.github.com/pitrou/60594b28d8e47edcdb97d9b15d5f9866
msg372368 - (view) Author: Srinivas Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) (thatiparthy) * Date: 2020-06-25 13:11
This looks like an easy task. Shall I create a PR?
msg372369 - (view) Author: Emmanuel Arias (eamanu) * Date: 2020-06-25 14:13
Hi,

IMO this can be mark as an easy issue.

@thatiparthy please, go ahead
msg372373 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2020-06-25 15:11
UnicodeEncodeError and UnicodeDecodeError are used to report un(en|de)codedable ranges in the source object, so it wouldn't make sense to use them for errors that have nothing to do with problems in the source object. Their constructor requires 5 arguments (encoding, object, start, end, reason), not just a simple message: e.g. UnicodeEncodeError("utf-8", "foo", 17, 23, "bad string").

But for reporting e.g. missing BOMs at the start it would be useful to use (0,  0) as the offending range.
msg372431 - (view) Author: Srinivas Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) (thatiparthy) * Date: 2020-06-26 16:31
@utk You could have taken some other easy issue from https://bugs.python.org/issue?status=1&@sort=-activity&@columns=id%2Cactivity%2Ctitle%2Ccreator%2Cstatus&@dispname=Easy%20issues&@startwith=0&@group=priority&keywords=6&@action=search&@filter=&@pagesize=50 instead of copy pasting my work.
msg372433 - (view) Author: utkarsh (utk) * Date: 2020-06-26 16:59
@thatiparthy These were the most logical changes, standard error messages, which were already there in the existing code, I just edited them as mentioned here. What part of your "work" do you think i copied?
Sent this PR to get familiar to the process mostly, i will close it if you feel insecure. No need to be rude.
thanks.
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85287
2020-06-26 16:59:17utksetmessages: + msg372433
2020-06-26 16:31:42thatiparthysetmessages: + msg372431
2020-06-26 16:14:34utksetnosy: + utk
pull_requests: + pull_request20329
2020-06-26 06:00:19thatiparthysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request20323
2020-06-25 16:12:39vstinnersetnosy: - vstinner
2020-06-25 15:11:54doerwaltersetnosy: + doerwalter
messages: + msg372373
2020-06-25 14:13:03eamanusetnosy: + eamanu
messages: + msg372369
2020-06-25 13:11:35thatiparthysetnosy: + thatiparthy
messages: + msg372368
2020-06-25 12:49:30pitroucreate