classification
Title: json.dump(ensure_ascii=False) return str instead of unicode
Type: behavior Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, ezio.melotti, mjpieters, petri.lehtinen, pitrou, python-dev, rhettinger, socketpair, terry.reedy
Priority: normal Keywords: patch

Created on 2012-01-11 16:41 by socketpair, last changed 2014-10-23 17:20 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
issue13769.patch petri.lehtinen, 2012-08-28 11:07
issue13769_v2.patch petri.lehtinen, 2012-08-29 18:38
issue13769_v3.patch petri.lehtinen, 2012-08-30 18:54
Messages (12)
msg151066 - (view) Author: Марк Коренберг (socketpair) * Date: 2012-01-11 16:41
$ ipython
In [1]: type(json.dumps({'a':'b'}, ensure_ascii=False))
Out[1]: <type 'str'>

In [2]: type(json.dumps({'a':u'b'}, ensure_ascii=False))
Out[2]: <type 'unicode'>
-----------------------
Documentation:
If ensure_ascii is False, then the return value will be a unicode instance.
--------------------------------

Not applicable to python3
msg151229 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-14 05:28
Ezio, Raymond: is it the doc that is wrong?
msg151231 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-01-14 05:40
The docstring says:
"""
    If ``ensure_ascii`` is false, then the return value will be a
    ``unicode`` instance subject to normal Python ``str`` to ``unicode``
    coercion rules instead of being escaped to an ASCII ``str``.
"""
msg169180 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-27 05:00
It seems to me that when ensure_ascii is False, the return value will be a unicode instance if and only if there's a unicode object anywhere in the input.

>>> json.dumps({'foo': 'bar'}, ensure_ascii=False)
'{"foo": "bar"}'

>>> json.dumps({'foo': u'bar'}, ensure_ascii=False)
u'{"foo": "bar"}'

>>> json.dumps({'foo': u'äiti'}, ensure_ascii=False)
u'{"foo": "\xe4iti"}'

>>> json.dumps({'foo': u'äiti'.encode('utf-8')}, ensure_ascii=False)
'{"foo": "\xc3\xa4iti"}'

>>> json.dumps({'foo': u'äiti'.encode('utf-16')}, ensure_ascii=False)
'{"foo": "\xff\xfe\xe4\\u0000i\\u0000t\\u0000i\\u0000"}'
msg169270 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-28 10:44
It may also be unicode if the encoding parameter is used even if there are no unicode objects in the input.

>>> json.dumps([u'Ş'.encode('iso-8859-9')], encoding='iso-8859-9', ensure_ascii=False)
u'["\u015e"]'
msg169273 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-28 11:07
Attached a patch for 2.7 that updates docs and docstrings.
msg169406 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-29 18:38
Attached an updated patch, which is more explicit on what ensure_ascii actually does.
msg169481 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-08-30 18:54
Attached yet another patch. It explains what input causes the result to be unicode instead of str.
msg169611 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-01 04:31
New changeset a1884b3027c5 by Petri Lehtinen in branch '2.7':
#13769: Enhance docs for ensure_ascii semantics in JSON decoding functions
http://hg.python.org/cpython/rev/a1884b3027c5
msg169612 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-09-01 04:32
Fixed, thanks.
msg229882 - (view) Author: Martijn Pieters (mjpieters) * Date: 2014-10-23 15:20
I'd say this is a bug in the library, not the documentation. The library varies the output type, making it impossible to use `json.dump()` with a `io.open()` object as the library will *mix data type* when writing. That is *terrible* behaviour.
msg229886 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-10-23 17:20
The revised doc admits the problem: "If *ensure_ascii* is False, some chunks written to *fp* may be unicode instances.  Unless fp.write() explicitly understands unicode (as in codecs.getwriter) this is likely to cause an error."

Making text be unicode in 3.x is our attempt at a generic fix to the problems resulting from the bug-prone 2.x 'text may be bytes or unicode' design.  Since continued 2.7 support is aimed at supporting legacy code, we are very reluctant to make behavior changes that could break working code.
History
Date User Action Args
2014-10-23 17:20:08terry.reedysetmessages: + msg229886
2014-10-23 15:20:20mjpieterssetnosy: + mjpieters
messages: + msg229882
2012-09-01 04:32:57petri.lehtinensetstatus: open -> closed
messages: + msg169612

keywords: - needs review
resolution: fixed
stage: patch review -> resolved
2012-09-01 04:31:52python-devsetnosy: + python-dev
messages: + msg169611
2012-08-30 18:54:56petri.lehtinensetfiles: + issue13769_v3.patch

messages: + msg169481
2012-08-29 18:38:59petri.lehtinensetfiles: + issue13769_v2.patch

messages: + msg169406
2012-08-28 11:08:39petri.lehtinensetnosy: + pitrou
2012-08-28 11:07:53petri.lehtinensetkeywords: + needs review, patch
files: + issue13769.patch
messages: + msg169273

stage: needs patch -> patch review
2012-08-28 10:51:25petri.lehtinenlinkissue14042 superseder
2012-08-28 10:44:38petri.lehtinensetmessages: + msg169270
2012-08-27 05:00:14petri.lehtinensetnosy: + petri.lehtinen
messages: + msg169180
2012-01-14 05:40:19ezio.melottisetmessages: + msg151231
2012-01-14 05:28:42terry.reedysetnosy: + rhettinger, ezio.melotti, terry.reedy

messages: + msg151229
stage: needs patch
2012-01-11 19:22:01loewissetversions: - Python 2.6
2012-01-11 16:41:19socketpaircreate