classification
Title: Exception('No JSON object could be decoded') when parsing a valid JSON
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: 19307 Superseder:
Assigned To: ezio.melotti Nosy List: Anoop.Thomas.Mathew, Gallaecio, docs@python, ezio.melotti, ncoghlan, python-dev, serhiy.storchaka, vajrasky
Priority: normal Keywords: easy, patch

Created on 2013-09-07 12:33 by Gallaecio, last changed 2013-10-20 23:11 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
input.json Gallaecio, 2013-09-07 12:33 Valid JSON code
json_BOM_signature_documentation.patch Anoop.Thomas.Mathew, 2013-09-15 03:59 patch for json loads BOM signature documentation review
issue18958.diff ezio.melotti, 2013-10-19 04:08
issue18958-2.diff ezio.melotti, 2013-10-20 03:23
issue18958-2-py3k.diff ezio.melotti, 2013-10-20 03:44 review
issue18958-3.diff ezio.melotti, 2013-10-20 05:24 review
Messages (20)
msg197152 - (view) Author: Adrián Chaves Fernández (Gallaecio) Date: 2013-09-07 12:33
Calling json.load() with a file object or json.loads() with a string containing the attached JSON code raises an exception with the message 'No JSON object could be decoded'.

I’ve pasted the JSON code into http://jsonlint.com/ and it reports it as valid JSON.

This JSON code comes from the 0 A.D. game (https://github.com/0ad/0ad/blob/master/binaries/data/mods/public/civs/maur.json), and the game successfully parses it as well (with whatever they use for that). Yet it fails with json.load() and json.loads().

Note also that the rest of the JSON files of the same game folder (https://github.com/0ad/0ad/tree/master/binaries/data/mods/public/civs) do work with json.load() and json.loads().
msg197155 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-07 13:10
>>> a = open('/tmp/input.json')
>>> b = a.read()
>>> b[0]
'\ufeff'
>>> import json
>>> json.loads(b[1:])
loads just fine....
>>> json.loads(b)
chokes.....


Whether python json module should handle '\ufeff' gracefully or not, I am not sure. Let me investigate it.
msg197158 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-07 13:15
The U+FEFF character is related with Byte order mark.

Reference:
http://en.wikipedia.org/wiki/Byte_Order_Mark
msg197160 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-07 13:35
Use the utf-8-sig encoding.

See also issue17909.
msg197163 - (view) Author: Adrián Chaves Fernández (Gallaecio) Date: 2013-09-07 14:42
I’ll veave how to address this up to you. Thanks a lot for finding out that the cause was the BOM, I’ve just removed it from the file and now everything works as expected.
msg197164 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-09-07 15:01
Switching to a docs bug - this won't be fixed in 2.7, but it should probably be documented as a limitation.
msg197745 - (view) Author: Anoop Thomas Mathew (Anoop.Thomas.Mathew) * Date: 2013-09-15 03:59
Patch for BOM signature documentation in json.loads
msg200360 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-19 03:47
I'm not sure this should be documented in json.load/loads, and I'm not sure people will look there once they get this exception.
The error is raised because the wrong codec is used (either by open() before passing the file object to json.load or by json.loads), so it's a user error rather than a problem with the json module.  The error turns out to be particularly misleading because the decoding is successful even though it produces a wrong result, and the problem becomes apparent only once it reaches json.
ISTM that the documentation is already clear enough that json doesn't auto-detect encodings and uses UTF-8 by default, and that different encodings should be specified explicitly.
I think that a better and backward-compatible solution would be to detect the UTF-8 BOM and provide a better error message hinting at utf-8-sig.
msg200361 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-19 04:08
Here is a proof of concept that raises this error:
>>> import json; json.load(open('input.json'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wolf/dev/py/2.7/Lib/json/__init__.py", line 290, in load
    **kw)
  File "/home/wolf/dev/py/2.7/Lib/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/home/wolf/dev/py/2.7/Lib/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/wolf/dev/py/2.7/Lib/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)

If the idea is OK I will add tests and implement it for the Python scanner too (and possibly tweak the error message if you have better suggestions).
msg200362 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-19 04:09
Forgot to add that the patch is for 2.7, and it also needs to be implemented in the unicode scanner.
msg200368 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-19 04:37
I like the new error message as a low-risk immediate improvement that nudges people in the direction of utf8-sig. It also leaves the door open to silently ignoring the BoM in the future without immediately committing to that approach.
msg200536 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-20 03:23
Here is an updated patch with tests.
msg200538 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-20 03:32
Updated patch looks good to me.
msg200540 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-20 03:52
As does the Py3k version :)
msg200542 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-20 04:12
Discussing this with Ezio on IRC, we decided that it probably makes more sense to do this check outside the scanner as preliminary validation of the input passed in via the public API. That will minimise the overhead and also avoids any potential side effects if "idx==0" is ever true in cases we're not currently testing.

The tests from the current patches should be OK, though.

Ezio also found that, for Py3k, adding an explicit check for non-str input and throwing an appropriate error would also be an improvement over the status quo:

>>> import json
>>> json.loads(b'')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ncoghlan/devel/py3k/Lib/json/__init__.py", line 316, in loads
    return _default_decoder.decode(s)
  File "/home/ncoghlan/devel/py3k/Lib/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: can't use a string pattern on a bytes-like object
msg200546 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-20 05:24
I opened a new issue about improving the error message: #19307.
After further discussion on IRC, we think that both #19307 and this issue should only be applied on 3.4 (the attached patch produces an even more misleading error that would require backporting #19307).
msg200560 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-20 10:25
The patch needs to be rebased on top of the issue 19307 patch, but I like this approach.

I say go ahead and commit it whenever you're ready :)
msg200622 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-20 19:41
LGTM.
msg200650 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-20 23:11
New changeset ac016cba7e64 by Ezio Melotti in branch 'default':
#18958: Improve error message for json.load(s) while passing a string that starts with a UTF-8 BOM.
http://hg.python.org/cpython/rev/ac016cba7e64
msg200651 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-20 23:11
Fixed, thanks for the feedback!
History
Date User Action Args
2013-10-20 23:11:58ezio.melottisetstatus: open -> closed
messages: + msg200651

assignee: docs@python -> ezio.melotti
resolution: fixed
stage: patch review -> resolved
2013-10-20 23:11:30python-devsetnosy: + python-dev
messages: + msg200650
2013-10-20 19:41:21serhiy.storchakasetmessages: + msg200622
2013-10-20 10:25:11ncoghlansetmessages: + msg200560
2013-10-20 05:24:07ezio.melottisetfiles: + issue18958-3.diff

dependencies: + Improve TypeError message in json.loads()
messages: + msg200546
versions: + Python 3.4, - Python 2.7
2013-10-20 04:12:52ncoghlansetmessages: + msg200542
2013-10-20 03:52:01ncoghlansetmessages: + msg200540
2013-10-20 03:44:55ezio.melottisetfiles: + issue18958-2-py3k.diff
2013-10-20 03:32:26ncoghlansetmessages: + msg200538
2013-10-20 03:23:28ezio.melottisetfiles: + issue18958-2.diff

messages: + msg200536
stage: needs patch -> patch review
2013-10-19 04:37:21ncoghlansetmessages: + msg200368
2013-10-19 04:09:56ezio.melottisetmessages: + msg200362
2013-10-19 04:08:43ezio.melottisetfiles: + issue18958.diff
2013-10-19 04:08:24ezio.melottisetmessages: + msg200361
2013-10-19 03:47:01ezio.melottisetmessages: + msg200360
2013-09-15 03:59:06Anoop.Thomas.Mathewsetfiles: + json_BOM_signature_documentation.patch

nosy: + Anoop.Thomas.Mathew
messages: + msg197745

keywords: + patch
2013-09-13 20:15:15ezio.melottisetkeywords: + easy
nosy: + ezio.melotti
2013-09-07 15:01:46ncoghlansetnosy: + docs@python, ncoghlan
messages: + msg197164

assignee: docs@python
components: + Documentation, - Extension Modules
stage: needs patch
2013-09-07 14:42:25Gallaeciosetmessages: + msg197163
2013-09-07 13:35:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg197160
2013-09-07 13:15:24vajraskysetmessages: + msg197158
2013-09-07 13:10:19vajraskysetnosy: + vajrasky
messages: + msg197155
2013-09-07 12:33:15Gallaeciocreate