classification
Title: Add an option to json.tool to bypass non-ASCII characters.
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Wei-Cheng.Pan, berker.peksag, r.david.murray, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-06-29 11:49 by Wei-Cheng.Pan, last changed 2017-07-15 19:00 by dhimmel.

Files
File name Uploaded Description Edit
json-add-an-option-to-bypass-non-ascii-characters-v4.patch Wei-Cheng.Pan, 2016-08-28 08:45 review
Pull Requests
URL Status Linked Edit
PR 201 open dhimmel, 2017-02-23 20:10
PR 2720 closed dhimmel, 2017-07-15 19:00
Messages (12)
msg269479 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-06-29 11:49
This patch adds a command line option "--no-escape" that allows json.tool to display non-ASCII characters.

e.g.:

$ echo '"測試"' | python -m json.tool
"\u6e2c\u8a66"

$ echo '"測試"' | python -m json.tool --no-escape
"測試"
msg269498 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-29 14:15
Maybe name it --no-insure-ascii?  Or --insure-ascii=no/yes. (or true/false).
msg269559 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-06-30 03:30
If the arguments should be aligned with those in dump/load, then maybe "--no-ensure-ascii" is an option?
msg269584 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-30 15:25
Sorry, yes, that's what I meant.  I think it will make it easier to understand and remember if the option uses the same terminology as the function.
msg269641 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-07-01 03:52
Use "--no-ensure-ascii" instead.
msg269648 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-07-01 07:36
The patch needs tests and documentation.

> +    parser.add_argument('--no-ensure-ascii', action='store_true', default=False,

I'd go with ``action='store_false', default=True``.
msg269667 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-07-01 13:45
> The patch needs tests and documentation.

Ok, I'll update it later.

>> +    parser.add_argument('--no-ensure-ascii', action='store_true', default=False,
>I'd go with ``action='store_false', default=True``.

If I'm not misreading your comment, this will change the original behavior, right? (because options.no_ensure_ascii will be set to True by default)
msg273803 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-08-28 03:21
Added doc and test.
msg273814 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-28 08:07
Test fails on non-utf8 locale.

$ LC_ALL=en_US ./python -m test.regrtest -v -m test_no_ensure_ascii_flag test_json
...
======================================================================
FAIL: test_no_ensure_ascii_flag (test.test_json.test_tool.TestTool)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 115, in test_no_ensure_ascii_flag
    self.assertEqual(out.splitlines(), b'"\\u6e2c\\u8a66"\n'.splitlines())
AssertionError: Lists differ: [b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"'] != [b'"\\u6e2c\\u8a66"']

First differing element 0:
b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"'
b'"\\u6e2c\\u8a66"'

- [b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"']
+ [b'"\\u6e2c\\u8a66"']

----------------------------------------------------------------------
msg273817 - (view) Author: Wei-Cheng Pan (Wei-Cheng.Pan) * Date: 2016-08-28 08:45
1. Replaced non-ASCII literals to \uXXXX
2. Removed failed assertion

Test passed with LC_ALL=en_US and LC_ALL=en_US.UTF-8 .

I've tried to use locale.getdefaultlocale(), but seems the output string will vary in different locales.
msg273819 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-08-28 10:23
> If I'm not misreading your comment, this will change the original behavior, right?

Assuming you also change

   ensure_ascii = not options.no_ensure_ascii

to

   ensure_ascii = options.no_ensure_ascii

no, it won't change the original behavior. That way you won't need to invert the value of ``options.no_ensure_ascii`` in line 37.
msg273911 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-30 11:36
The last change just sweeps a problem under a rug.

For now json.tool never fails with valid data. But with the --no-ensure-ascii option it can fail when output a string not encodable with the locale encoding. All can work with common cases on common UTF-8 environment, but unexpectedly fail on nonstandard environment. It would be better to output encodable characters as is and represent unencodable characters with \uXXXX encoding.
History
Date User Action Args
2017-07-15 19:00:48dhimmelsetpull_requests: + pull_request2779
2017-02-23 20:10:42dhimmelsetpull_requests: + pull_request232
2016-08-30 11:36:21serhiy.storchakasetmessages: + msg273911
2016-08-28 10:23:47berker.peksagsetmessages: + msg273819
2016-08-28 09:09:22rhettingersetmessages: - msg273813
2016-08-28 08:49:42rhettingersetassignee: rhettinger ->
2016-08-28 08:45:51Wei-Cheng.Pansetfiles: - json-add-an-option-to-bypass-non-ascii-characters-v3.patch
2016-08-28 08:45:43Wei-Cheng.Pansetfiles: + json-add-an-option-to-bypass-non-ascii-characters-v4.patch

messages: + msg273817
2016-08-28 08:07:30serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg273814
2016-08-28 07:34:28rhettingersetassignee: rhettinger

messages: + msg273813
nosy: + rhettinger
2016-08-28 03:21:30Wei-Cheng.Pansetfiles: - json-add-an-option-to-bypass-non-ascii-characters.patch
2016-08-28 03:21:13Wei-Cheng.Pansetfiles: + json-add-an-option-to-bypass-non-ascii-characters-v3.patch

messages: + msg273803
2016-07-01 13:45:39Wei-Cheng.Pansetmessages: + msg269667
2016-07-01 07:36:21berker.peksagsetnosy: + berker.peksag

messages: + msg269648
stage: patch review
2016-07-01 03:52:49Wei-Cheng.Pansetfiles: + json-add-an-option-to-bypass-non-ascii-characters.patch

messages: + msg269641
2016-07-01 03:51:22Wei-Cheng.Pansetfiles: - json-add-an-option-to-bypass-non-ascii-characters.patch
2016-06-30 15:25:08r.david.murraysetmessages: + msg269584
2016-06-30 03:30:32Wei-Cheng.Pansetmessages: + msg269559
2016-06-29 14:15:50r.david.murraysetnosy: + r.david.murray
messages: + msg269498
2016-06-29 11:49:29Wei-Cheng.Pancreate