Title: json.load() can raise UnicodeDecodeError, but this is not documented
Created on 2021-02-27 18:24 by mattheww, last changed 2021-04-08 06:12 by rhettinger.

Author: Matthew Woodcraft (mattheww) Date: 2021-02-27 18:24
The documentation for json.load() and json.loads() says:

« If the data being deserialized is not a valid JSON document, a JSONDecodeError will be raised. »

But this is not currently entirely true: if the data is provided in bytes form and is not properly encoded in one of the three accepted encodings, UnicodeDecodeError is raised instead.

(I have no opinion on whether the documentation or the behaviour should be changed.)
Author: Eric V. Smith (eric.smith) Date: 2021-02-27 23:25
As a rule we don't try and document every exception that can be raised. I could go either way on documenting encoding errors with the json module, although it seems pretty clear that an encoding error would be possible in this case.
Author: Raymond Hettinger (rhettinger) Date: 2021-02-27 23:43
Normally, we don't (or can't) enumerate all possible exceptions.  But
in this case, it is worth expanding the documentation so that person can know which of two common input errors they need to catch:

"If the data being deserialized is not valid UTF-8 a UnicodeDecodeError will be raised, and if the decoded file is not 
a valid JSON document, a JSONDecodeError will be raised".
Author: Serhiy Storchaka (serhiy.storchaka) Date: 2021-03-01 10:26
json.loads() accepts also data encoded with UTF-16 and UTF-32.
Author: Matthew Woodcraft (mattheww) Date: 2021-03-01 20:10
Further, "is not valid UTF-8" isn't quite true because the decoding is done with 'surrogatepass' set.

In practice I don't think many users will care which of the two exceptions they get for which inputs, but it's useful to know how broad your catch has to be if you're using load() on possibly-invalid inputs.
