Issue 13769: json.dump(ensure_ascii=False) return str instead of unicode

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57978

classification

Title:	json.dump(ensure_ascii=False) return str instead of unicode
Type:	behavior	Stage:	resolved
Components:	Documentation, Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	docs@python, ezio.melotti, mjpieters, petri.lehtinen, pitrou, python-dev, rhettinger, socketpair, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2012-01-11 16:41 by socketpair, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue13769.patch	petri.lehtinen, 2012-08-28 11:07
issue13769_v2.patch	petri.lehtinen, 2012-08-29 18:38
issue13769_v3.patch	petri.lehtinen, 2012-08-30 18:54

Messages (12)
msg151066 - (view)	Author: Марк Коренберг (socketpair) *	Date: 2012-01-11 16:41
$ ipython In [1]: type(json.dumps({'a':'b'}, ensure_ascii=False)) Out[1]: <type 'str'> In [2]: type(json.dumps({'a':u'b'}, ensure_ascii=False)) Out[2]: <type 'unicode'> ----------------------- Documentation: If ensure_ascii is False, then the return value will be a unicode instance. -------------------------------- Not applicable to python3
msg151229 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2012-01-14 05:28
Ezio, Raymond: is it the doc that is wrong?
msg151231 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2012-01-14 05:40
The docstring says: """ If ``ensure_ascii`` is false, then the return value will be a ``unicode`` instance subject to normal Python ``str`` to ``unicode`` coercion rules instead of being escaped to an ASCII ``str``. """
msg169180 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-08-27 05:00
It seems to me that when ensure_ascii is False, the return value will be a unicode instance if and only if there's a unicode object anywhere in the input. >>> json.dumps({'foo': 'bar'}, ensure_ascii=False) '{"foo": "bar"}' >>> json.dumps({'foo': u'bar'}, ensure_ascii=False) u'{"foo": "bar"}' >>> json.dumps({'foo': u'äiti'}, ensure_ascii=False) u'{"foo": "\xe4iti"}' >>> json.dumps({'foo': u'äiti'.encode('utf-8')}, ensure_ascii=False) '{"foo": "\xc3\xa4iti"}' >>> json.dumps({'foo': u'äiti'.encode('utf-16')}, ensure_ascii=False) '{"foo": "\xff\xfe\xe4\\u0000i\\u0000t\\u0000i\\u0000"}'
msg169270 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-08-28 10:44
It may also be unicode if the encoding parameter is used even if there are no unicode objects in the input. >>> json.dumps([u'Ş'.encode('iso-8859-9')], encoding='iso-8859-9', ensure_ascii=False) u'["\u015e"]'
msg169273 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-08-28 11:07
Attached a patch for 2.7 that updates docs and docstrings.
msg169406 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-08-29 18:38
Attached an updated patch, which is more explicit on what ensure_ascii actually does.
msg169481 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-08-30 18:54
Attached yet another patch. It explains what input causes the result to be unicode instead of str.
msg169611 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-09-01 04:31
New changeset a1884b3027c5 by Petri Lehtinen in branch '2.7': #13769: Enhance docs for ensure_ascii semantics in JSON decoding functions http://hg.python.org/cpython/rev/a1884b3027c5
msg169612 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2012-09-01 04:32
Fixed, thanks.
msg229882 - (view)	Author: Martijn Pieters (mjpieters) *	Date: 2014-10-23 15:20
I'd say this is a bug in the library, not the documentation. The library varies the output type, making it impossible to use `json.dump()` with a `io.open()` object as the library will mix data type when writing. That is terrible behaviour.
msg229886 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-10-23 17:20
The revised doc admits the problem: "If ensure_ascii is False, some chunks written to fp may be unicode instances. Unless fp.write() explicitly understands unicode (as in codecs.getwriter) this is likely to cause an error." Making text be unicode in 3.x is our attempt at a generic fix to the problems resulting from the bug-prone 2.x 'text may be bytes or unicode' design. Since continued 2.7 support is aimed at supporting legacy code, we are very reluctant to make behavior changes that could break working code.

History
Date	User	Action	Args
2022-04-11 14:57:25	admin	set	github: 57978
2014-10-23 17:20:08	terry.reedy	set	messages: + msg229886
2014-10-23 15:20:20	mjpieters	set	nosy: + mjpieters messages: + msg229882
2012-09-01 04:32:57	petri.lehtinen	set	status: open -> closed messages: + msg169612 keywords: - needs review resolution: fixed stage: patch review -> resolved
2012-09-01 04:31:52	python-dev	set	nosy: + python-dev messages: + msg169611
2012-08-30 18:54:56	petri.lehtinen	set	files: + issue13769_v3.patch messages: + msg169481
2012-08-29 18:38:59	petri.lehtinen	set	files: + issue13769_v2.patch messages: + msg169406
2012-08-28 11:08:39	petri.lehtinen	set	nosy: + pitrou
2012-08-28 11:07:53	petri.lehtinen	set	keywords: + needs review, patch files: + issue13769.patch messages: + msg169273 stage: needs patch -> patch review
2012-08-28 10:51:25	petri.lehtinen	link	issue14042 superseder
2012-08-28 10:44:38	petri.lehtinen	set	messages: + msg169270
2012-08-27 05:00:14	petri.lehtinen	set	nosy: + petri.lehtinen messages: + msg169180
2012-01-14 05:40:19	ezio.melotti	set	messages: + msg151231
2012-01-14 05:28:42	terry.reedy	set	nosy: + rhettinger, ezio.melotti, terry.reedy messages: + msg151229 stage: needs patch
2012-01-11 19:22:01	loewis	set	versions: - Python 2.6
2012-01-11 16:41:19	socketpair	create