classification
Title: Write unescaped unicode characters (Japanese, Chinese, etc) in JSON module when "ensure_ascii=False"
Type: enhancement Stage:
Components: Extension Modules, Unicode Versions: Python 3.3, Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Michael.Kuss, ezio.melotti, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2014-10-22 18:55 by Michael.Kuss, last changed 2015-02-10 08:43 by serhiy.storchaka. This issue is now closed.

Messages (6)
msg229830 - (view) Author: Michael Kuss (Michael.Kuss) Date: 2014-10-22 18:55
When running the following:

>> json.dump(['name': "港区"], myfile.json, indent=4, separators=(',', ': '), ensure_ascii=False)

the function escapes the unicode, even though I have explicitly asked to not force to ascii:
\u6E2F\u533A

By changing "__init__.py" such that the fp.write call encodes the text as utf-8, the output json file displays the human-readable text required (see below).


OLD (starting line 167):

if (not skipkeys and ensure_ascii and
        check_circular and allow_nan and
        cls is None and indent is None and separators is None and
        encoding == 'utf-8' and default is None and not kw):
        iterable = _default_encoder.iterencode(obj)
    else:
        if cls is None:
            cls = JSONEncoder
        iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
            check_circular=check_circular, allow_nan=allow_nan, indent=indent,
            separators=separators, encoding=encoding,
            default=default, **kw).iterencode(obj)
for chunk in iterable:
    fp.write(chunk)


NEW:

if (not skipkeys and ensure_ascii and
        check_circular and allow_nan and
        cls is None and indent is None and separators is None and
        encoding == 'utf-8' and default is None and not kw):
        iterable = _default_encoder.iterencode(obj)
        for chunk in iterable:
            fp.write(chunk)
    else:
        if cls is None:
            cls = JSONEncoder
        iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
            check_circular=check_circular, allow_nan=allow_nan, indent=indent,
            separators=separators, encoding=encoding,
            default=default, **kw).iterencode(obj)
        for chunk in iterable:
            fp.write(chunk.encode('utf-8'))
msg229834 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-22 19:39
If I fix your example so it runs:

json.dump({'name': "港区"}, open('myfile.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)

I get the expected output:

rdmurray@pydev:~/python/p34>cat myfile.json 
{
    "name": "港区"
}

That example won't work in python2, of course, so you'd have to show us your actual code there.
msg230365 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-10-31 18:29
The example works for me with both python 2 and 3.  I'm going to close this in a while if OP doesn't reply.

$ python2 -c "import json; json.dump({'name': '港区'}, open('py2.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py2.json
{
    "name": "港区"
}
$ python3 -c "import json; json.dump({'name': '港区'}, open('py3.json', 'w'), indent=4, separators=(',', ': '), ensure_ascii=False)" && cat py3.json
{
    "name": "港区"
}
msg230417 - (view) Author: Michael Kuss (Michael.Kuss) Date: 2014-11-01 00:50
Pardon the delay - this json dump function is embedded in a much larger script, so it took some untangling to get it running on Python 3.3, and scrub some personal identifying info from it. This script also does not work in Python 3.3:


  File "C:/Users/mkuss/PycharmProjects/TestJSON\dump_list_to_json_file.py", line 319, in dump_list_to_json_file
    json.dump(addresses, outfile, indent=4, separators=(',', ': '))
  File "C:\Python33\lib\json\__init__.py", line 184, in dump
    fp.write(chunk)
TypeError: 'str' does not support the buffer interface



In python 2.7, I still get escaped unicode when I try writing this dictionary using json.dump, so the work-around that I pasted originally is how I'm choosing to accomplish the task for now.

I'd you'd like, I can spend more time debugging this issue I'm running into running the script in python 3.3, but it maybe be til next week when I have sufficient time to solve. THANKS  --mike
msg230421 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-01 01:47
That error message indicates you've opened the output file in binary mode instead of text mode.
msg231994 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-02 13:19
Looks either you have opened a file with the backslashreplace error handler or ran Python with PYTHONIOENCODING which sets the backslashreplace error handler.
History
Date User Action Args
2015-02-10 08:43:39serhiy.storchakasetstatus: pending -> closed
2014-12-02 13:19:23serhiy.storchakasetstatus: open -> pending
nosy: + serhiy.storchaka
messages: + msg231994

2014-11-01 01:47:03r.david.murraysetmessages: + msg230421
2014-11-01 00:50:19Michael.Kusssetstatus: pending -> open

messages: + msg230417
2014-10-31 18:29:07ezio.melottisetstatus: open -> pending
resolution: works for me
messages: + msg230365
2014-10-22 19:39:34r.david.murraysetnosy: + r.david.murray
messages: + msg229834
2014-10-22 18:55:23Michael.Kusscreate