Issue 22701: Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66890

classification

Title:	Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"
Type:	enhancement	Stage:
Components:	Extension Modules, Unicode	Versions:	Python 3.3, Python 2.7

process

Status:	closed	Resolution:	works for me
Dependencies:		Superseder:
Assigned To:		Nosy List:	Michael.Kuss, ezio.melotti, r.david.murray, serhiy.storchaka, vstinner
Priority:	normal	Keywords:

Created on 2014-10-22 18:55 by Michael.Kuss, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg229830 - (view)	Author: Michael Kuss (Michael.Kuss)	Date: 2014-10-22 18:55
When running the following: >> json.dump(['name': "港区"], myfile.json, indent=4, separators=(',', ': '), ensure_ascii=False) the function escapes the unicode, even though I have explicitly asked to not force to ascii: \u6E2F\u533A By changing "__init__.py" such that the fp.write call encodes the text as utf-8, the output json file displays the human-readable text required (see below). OLD (starting line 167): if (not skipkeys and ensure_ascii and check_circular and allow_nan and cls is None and indent is None and separators is None and encoding == 'utf-8' and default is None and not kw): iterable = _default_encoder.iterencode(obj) else: if cls is None: cls = JSONEncoder iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii, check_circular=check_circular, allow_nan=allow_nan, indent=indent, separators=separators, encoding=encoding, default=default, kw).iterencode(obj) for chunk in iterable: fp.write(chunk) NEW: if (not skipkeys and ensure_ascii and check_circular and allow_nan and cls is None and indent is None and separators is None and encoding == 'utf-8' and default is None and not kw): iterable = _default_encoder.iterencode(obj) for chunk in iterable: fp.write(chunk) else: if cls is None: cls = JSONEncoder iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii, check_circular=check_circular, allow_nan=allow_nan, indent=indent, separators=separators, encoding=encoding, default=default, kw).iterencode(obj) for chunk in iterable: fp.write(chunk.encode('utf-8'))
msg229834 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-10-22 19:39
If I fix your example so it runs: json.dump({'name': "港区"}, open('myfile.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False) I get the expected output: rdmurray@pydev:~/python/p34>cat myfile.json { "name": "港区" } That example won't work in python2, of course, so you'd have to show us your actual code there.
msg230365 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2014-10-31 18:29
The example works for me with both python 2 and 3. I'm going to close this in a while if OP doesn't reply. $ python2 -c "import json; json.dump({'name': '港区'}, open('py2.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py2.json { "name": "港区" } $ python3 -c "import json; json.dump({'name': '港区'}, open('py3.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py3.json { "name": "港区" }
msg230417 - (view)	Author: Michael Kuss (Michael.Kuss)	Date: 2014-11-01 00:50
Pardon the delay - this json dump function is embedded in a much larger script, so it took some untangling to get it running on Python 3.3, and scrub some personal identifying info from it. This script also does not work in Python 3.3: File "C:/Users/mkuss/PycharmProjects/TestJSON\dump_list_to_json_file.py", line 319, in dump_list_to_json_file json.dump(addresses, outfile, indent=4, separators=(',', ': ')) File "C:\Python33\lib\json\__init__.py", line 184, in dump fp.write(chunk) TypeError: 'str' does not support the buffer interface In python 2.7, I still get escaped unicode when I try writing this dictionary using json.dump, so the work-around that I pasted originally is how I'm choosing to accomplish the task for now. I'd you'd like, I can spend more time debugging this issue I'm running into running the script in python 3.3, but it maybe be til next week when I have sufficient time to solve. THANKS --mike
msg230421 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-11-01 01:47
That error message indicates you've opened the output file in binary mode instead of text mode.
msg231994 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-12-02 13:19
Looks either you have opened a file with the backslashreplace error handler or ran Python with PYTHONIOENCODING which sets the backslashreplace error handler.

History
Date	User	Action	Args
2022-04-11 14:58:09	admin	set	github: 66890
2015-02-10 08:43:39	serhiy.storchaka	set	status: pending -> closed
2014-12-02 13:19:23	serhiy.storchaka	set	status: open -> pending nosy: + serhiy.storchaka messages: + msg231994
2014-11-01 01:47:03	r.david.murray	set	messages: + msg230421
2014-11-01 00:50:19	Michael.Kuss	set	status: pending -> open messages: + msg230417
2014-10-31 18:29:07	ezio.melotti	set	status: open -> pending resolution: works for me messages: + msg230365
2014-10-22 19:39:34	r.david.murray	set	nosy: + r.david.murray messages: + msg229834
2014-10-22 18:55:23	Michael.Kuss	create