This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients chrism, ncoghlan
Date 2013-11-30.02:30:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1385778645.87.0.0800898315059.issue19837@psf.upfronthosting.co.za>
In-reply-to
Content
In the Python 3 transition, we had to make a choice regarding whether we treated the JSON module as a text transform (with load[s] reading Unicode code points and dump[s] producing them), or as a text encoding (with load[s] reading binary sequences and dump[s] producing them).

To minimise the changes to the module API, the decision was made to treat it as a text transform, with the text encoding handled externally.

This API design decision doesn't appear to have worked out that well in the web development context, since JSON is typically encountered as a UTF-8 encoded wire protocol, not as already decoded text.

It also makes the module inconsistent with most of the other modules that offer "dumps" APIs, as those *are* specifically about wire protocols (Python 3.4):

>>> import json, marshal, pickle, plistlib, xmlrpc.client
>>> json.dumps('hello')
'"hello"'
>>> marshal.dumps('hello')
b'\xda\x05hello'
>>> pickle.dumps('hello')
b'\x80\x03X\x05\x00\x00\x00helloq\x00.'
>>> plistlib.dumps('hello')
b'<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">\n<plist version="1.0">\n<string>hello</string>\n</plist>\n'

The only module with a dumps function that (like the json module) returns a string, is the XML-RPC client module:

>>> xmlrpc.client.dumps(('hello',))
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'

And that's nonsensical, since that XML-RPC API *accepts an encoding argument*, which it now silently ignores:

>>> xmlrpc.client.dumps(('hello',), encoding='utf-8')
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'
>>> xmlrpc.client.dumps(('hello',), encoding='utf-16')
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'

I now believe that an "encoding" parameter should have been added to the json.dump API in the Py3k transition (defaulting to UTF-8), allowing all of the dump/load APIs in the standard library to be consistently about converting to and from a binary wire protocol.

Unfortunately, I don't have a solution to offer at this point (since backwards compatibility concerns rule out the simple solution of just changing the return type). I just wanted to get it on record as a problem (and internal inconsistency within the standard library for dump/load protocols) with the current API.
History
Date User Action Args
2013-11-30 02:30:46ncoghlansetrecipients: + ncoghlan, chrism
2013-11-30 02:30:45ncoghlansetmessageid: <1385778645.87.0.0800898315059.issue19837@psf.upfronthosting.co.za>
2013-11-30 02:30:45ncoghlanlinkissue19837 messages
2013-11-30 02:30:43ncoghlancreate