classification
Title: json.JSONDecoder() strict argument undocumented and potentially confusing
Type: behavior Stage:
Components: Documentation Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: beazley, georg.brandl, taleinat
Priority: normal Keywords: patch

Created on 2008-12-30 17:42 by beazley, last changed 2010-10-15 17:04 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
json_docs_py3k.diff taleinat, 2010-06-05 09:18 proposed documentation patch
json_docs_trunk.diff taleinat, 2010-06-05 09:19
Messages (6)
msg78550 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 17:42
The strict parameter to JSONDecoder() is undocumented and is confusing 
because someone might assume it has something to do with the encoding 
parameter or the general handling of parsing errors (which it doesn't).

As far as I can determine by reading the source, strict determines 
whether or not JSON strings are allowed to contain literal newlines in 
them or not.  For example (note: loads() passes its parameters to 
JSONDecoder):

>>> s = '{"test":"Hello\nWorld"}'
>>> print(s)
{"test":"Hello
World"}
>>> json.loads(s)
Traceback (most recent call last):
...
  File "/tmp/lib/python3.0/json/decoder.py", line 159, in JSONString
    return scanstring(match.string, match.end(), encoding, strict)
ValueError: Invalid control character at: line 1 column 14 (char 14)

>>> json.loads(s,strict=False)
{'test': 'Hello\nWorld'}
>>> 

Note in this last example how the result has the literal newline 
embedded in it when strict is set False.
msg107067 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-04 15:10
This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, the strict parameter allow control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.

Documentation should be updated accordingly.
msg107068 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-04 15:13
This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, strict=False allows control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.

Documentation should be updated accordingly.
msg107125 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-05 09:18
Documentation patch attached against py3k branch.

Changes are:

* Added to documentation of JSONDecoder:

If *strict* is ``False`` (``True`` is the default), then control characters will be allowed inside strings.  Control characters in this context are those with character codes in the 0-31 range, including ``'\t'`` (tab), ``'\n'``, ``'\r'`` and ``'\0'``.

* Added clarification in documentation of json.load and json.dump that unless the cls kwarg is specified, the JSONEncoder/JSONDecoder class will be used.

* Mirrored these additions in the relevant doc-strings (JSONDecoder.__init__, json.load, json.loads, json.dump, json.dumps).

* Copied description of the object_pairs_hook kwargs from the documentation to the relevant doc-strings, which otherwise fully mirrored the documentation. (json.load, json.loads, JSONDecoder.__init__)
msg107126 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2010-06-05 09:19
Similar patch against trunk; same changes as for the py3k branch.
msg118806 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-15 17:04
Thanks, applied in r85543 and r85544.
History
Date User Action Args
2010-10-15 17:04:56georg.brandlsetstatus: open -> closed
resolution: fixed
messages: + msg118806
2010-06-05 09:20:11taleinatsetversions: + Python 2.7
2010-06-05 09:19:44taleinatsetfiles: + json_docs_trunk.diff

messages: + msg107126
2010-06-05 09:18:23taleinatsetfiles: + json_docs_py3k.diff
keywords: + patch
messages: + msg107125

versions: + Python 3.2, - Python 2.6, Python 3.0
2010-06-04 15:13:47taleinatsetmessages: + msg107068
2010-06-04 15:10:12taleinatsetnosy: + taleinat
messages: + msg107067
2008-12-30 17:42:13beazleycreate