This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv: undocumented UnicodeDecodeError on malformed file
Type: behavior Stage: resolved
Components: Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: alter-bug-tracer, brett.cannon, remi.lapeyre
Priority: normal Keywords:

Created on 2019-05-20 18:13 by alter-bug-tracer, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
csv.zip alter-bug-tracer, 2019-05-20 18:13
csv_parser.py remi.lapeyre, 2019-05-21 10:15
file0.txt remi.lapeyre, 2019-05-21 10:15
file1.txt remi.lapeyre, 2019-05-21 10:16
Messages (5)
msg342939 - (view) Author: alter-bug-tracer (alter-bug-tracer) * Date: 2019-05-20 18:13
UnicodeDecodeError is thrown instead of csv.Error when parsing malformed inputs.
Examples:
1. file0
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 0: invalid continuation byte
Traceback (most recent call last):
  File "csv_parser.py", line 6, in <module>
    for row in reader:
  File "/usr/local/lib/python3.8/csv.py", line 111, in __next__
    row = next(self.reader)
  File "/usr/local/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
2. file1
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 51: invalid start byte
Traceback (most recent call last):
  File "csv_parser.py", line 6, in <module>
    for row in reader:
  File "/usr/local/lib/python3.8/csv.py", line 110, in __next__
    self.fieldnames
  File "/usr/local/lib/python3.8/csv.py", line 97, in fieldnames
    self._fieldnames = next(self.reader)
  File "/usr/local/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

(file0, file1 and csv_parser.py attached)
msg342999 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-21 10:18
I don't understand the issue here, csv can raise many errors when an issue happens:

>>> import csv
>>> csv.reader(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument 1 must be an iterator

Why would UnicodeDecodeError not be appropriate here?
msg343023 - (view) Author: alter-bug-tracer (alter-bug-tracer) * Date: 2019-05-21 12:05
Shouldn't all of them be documented? Either that, or converted to csv.Error? Take, for example, the C++ std.
msg343040 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-05-21 12:40
I don't think all errors can be documented, csv iterate over the object but has no idea what it is. When writing for example, anything could happen, from a socket timing out, permissions errors, the underlying media being removed not properly, the media having no more space, etc...

ISTM that catching all those exceptions and hiding them behind csv.Error is bad practice and not recommended. In C++, uncaught exceptions are part of the function signature so it easier to do this but in Python we have no idea what the object you gave can raise when iterating over it.
msg343324 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-05-23 20:41
This isn't a bug because the CSV format isn't malformed (which would be appropriate for csv.Error), but the file itself isn't appropriate encoded (or the proper encoding wasn't specified (hence UnicodeDecodeError). So the exception is appropriate.

And we do not document indirect exceptions that get raised by code, only those that are explicitly raised. So everything is as it's expected.
History
Date User Action Args
2022-04-11 14:59:15adminsetgithub: 81156
2019-05-23 20:41:11brett.cannonsetstatus: open -> closed

nosy: + brett.cannon
messages: + msg343324

resolution: not a bug
stage: resolved
2019-05-21 12:40:28remi.lapeyresetmessages: + msg343040
2019-05-21 12:05:22alter-bug-tracersetmessages: + msg343023
2019-05-21 10:18:23remi.lapeyresetnosy: + remi.lapeyre
messages: + msg342999
2019-05-21 10:16:01remi.lapeyresetfiles: + file1.txt
2019-05-21 10:15:50remi.lapeyresetfiles: + file0.txt
2019-05-21 10:15:40remi.lapeyresetfiles: + csv_parser.py
2019-05-20 18:13:43alter-bug-tracercreate