classification
Title: json encoder does not support JSONP/JavaScript safe escaping
Type: enhancement Stage: resolved
Components: Library (Lib) Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Ztane, ezio.melotti, pitrou, rhettinger, serhiy.storchaka, tomchristie
Priority: normal Keywords:

Created on 2013-06-24 04:46 by Ztane, last changed 2014-12-02 15:32 by tomchristie. This issue is now closed.

Messages (8)
msg191742 - (view) Author: Antti Haapala (Ztane) * Date: 2013-06-24 04:46
JSON is not a strict superset of JavaScript (http://timelessrepo.com/json-isnt-a-javascript-subset). However, certain web technologies use JSON values as a part of JavaScript code (JSONP, inline <script> tags)... The Python json module, however, by default does not escape \u2028 or \u2029 when ensure_ascii is false. Furthermore, the / -> \/ escape is not supported by any switch.

Strictly speaking, json specification only requires that " be escaped to \" and \ to \\ - all other escaping is optional. The whitespace escapes only exist to aid handwriting and embedding values in HTML/code. Thus it can be argued that the choice of escapes used by json encoder is ill-adviced.

In an inline HTML <script></script> tag, no < cannot be escaped; however only the string '</script>' (or sometimes </) is interpreted as the "end of script". Thus a non-trivial XSS attack can be made by having a JSON stream {"key":"</script><script src=''></script>"} embedded in inline javascript. The only correct way to escape such content in inline html is to escape all / into \/.

The \u2028, \u2029 problem is more subtle and can break not only inline javascript but also JSONP. Thus there an incorrect value injected by a malicious or unwitting user to the database might break the entire protocol.

The current solution is to re-escape everything that comes out of JSON encoder. The best solution for python would be to make these 3 escapes default in the python json module (notice again that the current set of default escapes when ensure_ascii=False is chosen arbitrarily), or if not default, then at least they could be enabled by a switch. Furthermore, documentation should be updated appropriately, to explain why such escape is needed.
msg191744 - (view) Author: Antti Haapala (Ztane) * Date: 2013-06-24 04:57
My mistake in writing, json ofc does specify that "control characters" be escaped. Then, it needs to be pointed out that JSON module DOES not currently escape \u007f-\u009f as it maybe strictly should

>>> unicodedata.category('\u007f')
'Cc'
>>> json.dumps({'a': '\u007f'}, ensure_ascii=False)
'{"a": "\x7f"}'
msg194537 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-06 12:29
I think this is not JSON issue. If you need escaping of some domain-specific characters, do it youself. I.e.

    json.dump(...).replace('\u2028', r'\u2028').replace('\u2029', r'\u2029').replace('</', r'\u003c\u002f')
msg194581 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-06 21:25
On the one hand, supporting JSONP is a valid request for the json module. On the other hand, according to Wikipedia, "There have been some criticisms raised about JSONP. Cross-origin resource sharing (CORS) is a more recent method of getting data from a server in a different domain, which addresses some of those criticisms". Therefore, supporting JSONP might not really be worth it.
msg194648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-08 07:27
Embedding JSON inside <script> tag doesn't differ from embedding any string in some format (i.e. JSON in Python string, Python sources in HTML, or XML in a shell script). We just escape characters which have special meaning.

I propose close this issue because embedding JSON (as any other generated code) in inline JavaScript can be done very easily with a sequence of string replaces. This has no relations to the json module.
msg231999 - (view) Author: Tom Christie (tomchristie) Date: 2014-12-02 14:33
I believe the status of this should be reassessed and that python should default to escaping '\u2028' and '\u2029'. *Strictly* speaking this isn't a bug and is per the JSON spec.

*However* this *is* a bug in the JSON spec - which *should* be a strict subset of JSON. Given that both escaped and unescaped are valid, ensuring that those two characters *are* always escaped would clearly be more user-friendly behavior on our part, and *would* lead to less bugs in, say web frameworks that use the JSON module and then pass the output to template (eg populating a javscript variable with some JSON output).
msg232004 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-02 15:19
There is explicit note in the documentation about incompatibility with JavaScript.
msg232006 - (view) Author: Tom Christie (tomchristie) Date: 2014-12-02 15:32
> There is explicit note in the documentation about incompatibility with JavaScript.

That may be, but we're still unnecessarily making for a poorer user experience. There's no good reason why we shouldn't just treat \u2028 and \u2029 as control characters - it's only going to making things better for developers using the json module. It is an unnecessary usability bug as it stands.

Just because JSON has a bug in its spec wrt those two characters, doesn't mean we can't help our users avoid ever having to know about that or work around it in user code.
History
Date User Action Args
2014-12-02 15:32:10tomchristiesetmessages: + msg232006
2014-12-02 15:19:46serhiy.storchakasetmessages: + msg232004
2014-12-02 14:33:05tomchristiesetnosy: + tomchristie
messages: + msg231999
2013-11-20 11:49:11serhiy.storchakasetstatus: pending -> closed
stage: resolved
2013-08-08 07:27:16serhiy.storchakasetstatus: open -> pending

messages: + msg194648
2013-08-06 21:25:49pitrousetstatus: pending -> open

messages: + msg194581
2013-08-06 12:29:26serhiy.storchakasetstatus: open -> pending
resolution: not a bug
messages: + msg194537
2013-06-24 08:08:20serhiy.storchakasetnosy: + rhettinger, pitrou, ezio.melotti, serhiy.storchaka
2013-06-24 04:57:24Ztanesetmessages: + msg191744
2013-06-24 04:46:19Ztanecreate