classification
Title: json.loads() on str should return unicode, not str
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, bob.ippolito, docs@python, doerwalter, fdrake, llnik, pitrou, rhettinger
Priority: normal Keywords: patch

Created on 2010-10-06 15:36 by llnik, last changed 2010-11-02 21:03 by barry. This issue is now closed.

Files
File name Uploaded Description Edit
json.diff doerwalter, 2010-11-02 16:15
Messages (17)
msg118069 - (view) Author: Nik Tautenhahn (llnik) Date: 2010-10-06 15:36
Hi,

before 2.7, an
import json
json.loads('"abc"')

yielded u"abc".

in 2.7 I get
"abc" (a byte string).

I would have expected an entry in "news" or "What's new in 2.7" why this change happened. In addition, all examples at http://docs.python.org/library/json are wrong for Python 2.7  if json.loads is involved.

Any insight on this?

best regards,
Nik
msg118070 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-06 15:45
This is related to this issue from simplejson:
http://code.google.com/p/simplejson/issues/detail?id=28

This problem is why I still use simplejson 1.x; moving forward to simplejson 2.x or Python's json is unlikely.
msg118085 - (view) Author: Nik Tautenhahn (llnik) Date: 2010-10-06 21:56
Well, then at least the documentation and the "What's changed" need to be updated. Furthermore, if such decisions are made, it would be at least nice to have some general "decode-hook" for json.JSONDecoder - the "object_hook" is only used for dict-objects - why is there no hook for strings or a general hook which is used on any objects?
msg118094 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-07 04:20
As I understand it, the decision to return str instead of unicode
values for the "simplejson" module was simply inherited by the
standard library.  As such, it still needs to be evaluated in the
context of the standard library, because of the incompatibility it
introduces.

I still maintain that it's a bug, and should be treated as such.
msg118098 - (view) Author: Nik Tautenhahn (llnik) Date: 2010-10-07 09:23
Yep, the solution should not be "maybe it's str, maybe it's unicode" - I mean, if the decoder gives you a str if there are no fancy characters and unicode if it contains some, this might lead to some confusion... And yes, in my opinion, this is a bug, too.
msg119739 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-10-27 20:57
I completely agree with Fred; this is a regression and a bug in Python 2.7 and should be fixed.  I have a doctest in Mailman 3 for example that cannot pass in both Python 2.6 and 2.7 (without IMO ugly hackery).  Not only that, but json is documented as converting JSON str to unicode, which it does fine in Python 2.6, 3.1 and 3.2.  Why should Python 2.7 be different (and broken)?
msg119752 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-28 01:55
I'll note that it seems relevant that this package is not considered
"externally maintained" by the terms of PEP 360:

    http://www.python.org/dev/peps/pep-0360/

Given the level of attention this has received from the originator of
the code, we should not hesitate to commit technically acceptable
changes to the Python repository,
msg119807 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-28 17:58
+1 for fixing this in-tree. We need a patch, though ;)
msg120231 - (view) Author: Nik Tautenhahn (llnik) Date: 2010-11-02 15:33
There is even more inconsistency here.

As already mentioned, we have this:


>>> import json
>>> json.loads(json.dumps("abc"))

'abc'

If, however, I am evil and hide _json.so (which is the C-part of the json module for speedup), the JSON code falls back to its python implementation and voila:

>>> import json
>>> json.loads(json.dumps("abc"))

u'abc'

Not so neat, if your fallback is not a fallback but shows such different behaviour.
msg120233 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-11-02 15:41
Alternately, the Python implementation may be thought of as definitive
and the optimizations are broken.
msg120237 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2010-11-02 16:15
The following patch (against the release27-maint branch) seems to fix the problem.
msg120244 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-11-02 19:19
The fact that the C and Python versions are not fully tested (afaict) is not good.  I'm not sure that's worth fixing for 2.7 and it's probably worth a separate bug report for Python 3.2 on that.

In the meantime, I'll test Walter's patch and add a unit test for this case.
msg120247 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-11-02 19:27
The incomplete testing and C/Python implementation mismatch are covered by issue 5723 and issue 9233.
msg120255 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-02 20:35
To mitigate possible negative impacts from changing the return type, consider adding a parse_string hook that lets users control the return type:

   json.loads(f, parse_int=decimal.Decimal, parse_string=repr)
msg120256 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-11-02 20:40
Adding that argument to Python 2.7 seems like new feature territory.
msg120257 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-11-02 20:57
@doerwalter: patch looks good.  I've added a test and will commit momentarily.
msg120258 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-11-02 21:03
r86126
History
Date User Action Args
2010-11-02 21:03:29barrysetstatus: open -> closed
assignee: bob.ippolito -> barry
resolution: fixed
messages: + msg120258
2010-11-02 20:57:36barrysetmessages: + msg120257
2010-11-02 20:40:20barrysetmessages: + msg120256
2010-11-02 20:35:45rhettingersetnosy: + rhettinger
messages: + msg120255
2010-11-02 19:27:11fdrakesetmessages: + msg120247
2010-11-02 19:19:53barrysetmessages: + msg120244
2010-11-02 16:15:09doerwaltersetfiles: + json.diff

nosy: + doerwalter
messages: + msg120237

keywords: + patch
2010-11-02 15:41:42fdrakesetmessages: + msg120233
2010-11-02 15:33:05llniksetmessages: + msg120231
2010-10-28 17:58:28pitrousetnosy: + pitrou
messages: + msg119807
2010-10-28 04:13:01fdrakesettitle: Returntype of json.loads() on strings -> json.loads() on str should return unicode, not str
2010-10-28 01:55:43fdrakesetmessages: + msg119752
title: json.loads() on str should return unicode, not str -> Returntype of json.loads() on strings
2010-10-27 20:59:44barrysettitle: json.loads() on str erroneously returns str. should return unicode -> json.loads() on str should return unicode, not str
2010-10-27 20:59:23barrysettitle: Returntype of json.loads() on strings -> json.loads() on str erroneously returns str. should return unicode
2010-10-27 20:57:23barrysetnosy: + barry
messages: + msg119739
2010-10-27 20:43:55pitroulinkissue10216 superseder
2010-10-07 09:23:28llniksetmessages: + msg118098
2010-10-07 04:20:41fdrakesetmessages: + msg118094
2010-10-06 21:56:20llniksetmessages: + msg118085
2010-10-06 15:46:02fdrakesetnosy: + fdrake
messages: + msg118070
2010-10-06 15:41:55pitrousetassignee: docs@python -> bob.ippolito

nosy: + bob.ippolito
2010-10-06 15:36:57llnikcreate