Issue22767
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014-10-30 16:35 by Tom.Christie, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (11) | |||
---|---|---|---|
msg230274 - (view) | Author: Tom Christie (Tom.Christie) | Date: 2014-10-30 16:35 | |
This is one of those behavioural issues that is a borderline bug. The seperators argument to `json.dumps()` behaves differently across python 2 and 3. * In python 2 it should be provided as a bytestring, and can cause a UnicodeDecodeError otherwise. * In python 3 it should be provided as unicode,and can cause a TypeError otherwise. Examples: Python 2.7.2 >>> print json.dumps({'snowman': '☃'}, separators=(':', ','), ensure_ascii=False) {"snowman","☃"} >>> print json.dumps({'snowman': '☃'}, separators=(u':', u','), ensure_ascii=False) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) And: Python 3.4.0 >>> print(json.dumps({'snowman': '☃'}, separators=(':', ','), ensure_ascii=False)) {"snowman","☃"} >>> print(json.dumps({'snowman': '☃'}, separators=(b':', b','), ensure_ascii=False)) <...> TypeError: sequence item 2: expected str instance, bytes found Technically this isn't out of line with the documentation - in both cases it uses `separators=(':', ',')` which is indeed the correct type in both v2 and v3. However it's unexpected behaviour that it changes types between versions, without being called out. Working on a codebase with `from __future__ import unicode_literals` this is particularly unexpected because we get a `UnicodeDecodeError` when running code that otherwise looks correct. It's also slightly awkward to fix because it's a bit of a weird branch condition. The fix would probably be to forcibly coerce it to the correct type regardless of if it is supplied as unicode or a bytestring, or at least to do so for python 2.7. Possibly related to http://bugs.python.org/issue22701 but wasn't able to understand if that ticket was in fact a different user error. |
|||
msg230275 - (view) | Author: Georg Brandl (georg.brandl) * | Date: 2014-10-30 17:32 | |
IMO the snowman should be a Unicode string in the second example for Python 2.7. |
|||
msg230276 - (view) | Author: Georg Brandl (georg.brandl) * | Date: 2014-10-30 17:33 | |
> in the second example or even, in both examples. |
|||
msg230279 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-30 18:03 | |
And that works, including with the future import. I don't remember if this is a bug we've fixed since 2.7.2, but I don't think so. In Python3, json explicitly does not support bytes. |
|||
msg230289 - (view) | Author: Tom Christie (Tom.Christie) | Date: 2014-10-30 19:12 | |
Not too fussed if this is addressed or not, but I think this is closed a little prematurely. I don't think there's a problem under Python 3, that's entirely reasonable. However under Python 2, `json.dumps()` will normally handle *either* bytes or unicode transparently for you (just altering the return type accordingly). If you happen to be using unicode separators, then the normally lax behaviour of "either unicode or bytes" that stops being the case. |
|||
msg230291 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-30 19:20 | |
But only if you use non-ascii in the binary input, in which case you get an encoding error, which is a correct error. |
|||
msg230296 - (view) | Author: Tom Christie (Tom.Christie) | Date: 2014-10-30 19:38 | |
> But only if you use non-ascii in the binary input, in which case you get an encoding error, which is a correct error. Kind of, except that this (python 2.7) works just fine: >>> data = {'snowman': '☃'} >>> json.dumps(data, ensure_ascii=False) '{"snowman": "\xe2\x98\x83"}' Whereas this raises an exception: >>> json.dumps(data, separators=(u':', u','), ensure_ascii=False) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) If it was the same in both cases then I wouldn't consider it a problem. As it is, introducing the `seperators` parameter changes the behaviour. Anyways, I'll get off my high horse now. :) |
|||
msg230298 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-30 19:49 | |
No, it is introducing the unicode that is the problem. Your first example is entirely binary. It is only when you *mix* binary and unicode that you have encoding problems (because python doesn't know the encoding of the binary data...well, more precisely it doesn't have one). This confusion is a large part of why python3 exists :) |
|||
msg230299 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-30 20:00 | |
Or, to put it another way, we agree with you that both cases should behave the same: using binary data in a json dumps call should raise an error. And in python3 they do. But in python2 there is a confusion as to what is text and what is binary, and so sometimes things work that shouldn't. In python2 a binary string with non-ascii characters is accepted by the dumps call...it shouldn't be since json is defined as a text protocol. But it is baked into the python2 string model that it such binary does work, because in python2 it was assumed that the programmer was responsible for making sure that the encoding of all their binary strings was consistent. But to mix unicode and binary, you *must* make the encoding of the binary strings explicit, otherwise there's no way to correctly compose the binary data with the text data. So, as soon as (but only as soon as) you mix unicode with your non-ascii data, your program blows up. Thus python3. |
|||
msg230300 - (view) | Author: Tom Christie (Tom.Christie) | Date: 2014-10-30 20:16 | |
> So, as soon as (but only as soon as) you mix unicode with your non-ascii data, your program blows up. Indeed. For context tho my example of running into this the unicode literals used as seperators weren't even in the same package as the non-ASCII binary strings. (JSONRenderer in Django REST framework, being excersized by some third party test code. End result non-obvious exception. Anyways, okay with this resolution, although I am now using a compat branch to ensure that we use binary seperators in py2 to continue to get the more lax rendering style. |
|||
msg230301 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2014-10-30 20:23 | |
Yes, that third party problem is a prime example of exactly why this needed to be fixed, but it required python3 to fix it. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:09 | admin | set | github: 66956 |
2014-10-30 20:23:40 | r.david.murray | set | messages: + msg230301 |
2014-10-30 20:16:09 | Tom.Christie | set | messages: + msg230300 |
2014-10-30 20:00:07 | r.david.murray | set | messages: + msg230299 |
2014-10-30 19:49:58 | r.david.murray | set | messages: + msg230298 |
2014-10-30 19:38:27 | Tom.Christie | set | messages: + msg230296 |
2014-10-30 19:20:18 | r.david.murray | set | messages: + msg230291 |
2014-10-30 19:12:29 | Tom.Christie | set | messages: + msg230289 |
2014-10-30 18:03:49 | r.david.murray | set | status: open -> closed nosy: + r.david.murray messages: + msg230279 resolution: not a bug stage: resolved |
2014-10-30 17:33:16 | georg.brandl | set | messages: + msg230276 |
2014-10-30 17:32:40 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg230275 |
2014-10-30 16:35:50 | Tom.Christie | create |