Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to json.tool to bypass non-ASCII characters. #71600

Closed
legnaleurc mannequin opened this issue Jun 29, 2016 · 13 comments
Closed

Add an option to json.tool to bypass non-ASCII characters. #71600

legnaleurc mannequin opened this issue Jun 29, 2016 · 13 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@legnaleurc
Copy link
Mannequin

legnaleurc mannequin commented Jun 29, 2016

BPO 27413
Nosy @rhettinger, @bitdancer, @methane, @berkerpeksag, @serhiy-storchaka, @legnaleurc, @qingyunha
PRs
  • bpo-27413: add --no-ensure-ascii argument to json.tool #201
  • bpo-30971: Improve code readability of json.tool #2720
  • bpo-27413: json.tool: Add --no-ensure-ascii option. #17472
  • Files
  • json-add-an-option-to-bypass-non-ascii-characters-v4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-12-06.06:44:23.432>
    created_at = <Date 2016-06-29.11:49:29.439>
    labels = ['type-feature', 'library', '3.9']
    title = 'Add an option to json.tool to bypass non-ASCII characters.'
    updated_at = <Date 2019-12-06.06:44:23.427>
    user = 'https://github.com/legnaleurc'

    bugs.python.org fields:

    activity = <Date 2019-12-06.06:44:23.427>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-12-06.06:44:23.432>
    closer = 'methane'
    components = ['Library (Lib)']
    creation = <Date 2016-06-29.11:49:29.439>
    creator = 'legnaleurc'
    dependencies = []
    files = ['44246']
    hgrepos = []
    issue_num = 27413
    keywords = ['patch']
    message_count = 13.0
    messages = ['269479', '269498', '269559', '269584', '269641', '269648', '269667', '273803', '273814', '273817', '273819', '273911', '357902']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'r.david.murray', 'methane', 'berker.peksag', 'serhiy.storchaka', 'legnaleurc', 'qingyunha']
    pr_nums = ['201', '2720', '17472']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue27413'
    versions = ['Python 3.9']

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Jun 29, 2016

    This patch adds a command line option "--no-escape" that allows json.tool to display non-ASCII characters.

    e.g.:

    $ echo '"測試"' | python -m json.tool
    "\u6e2c\u8a66"
    
    $ echo '"測試"' | python -m json.tool --no-escape
    "測試"

    @legnaleurc legnaleurc mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jun 29, 2016
    @bitdancer
    Copy link
    Member

    Maybe name it --no-insure-ascii? Or --insure-ascii=no/yes. (or true/false).

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Jun 30, 2016

    If the arguments should be aligned with those in dump/load, then maybe "--no-ensure-ascii" is an option?

    @bitdancer
    Copy link
    Member

    Sorry, yes, that's what I meant. I think it will make it easier to understand and remember if the option uses the same terminology as the function.

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Jul 1, 2016

    Use "--no-ensure-ascii" instead.

    @berkerpeksag
    Copy link
    Member

    The patch needs tests and documentation.

    • parser.add_argument('--no-ensure-ascii', action='store_true', default=False,

    I'd go with action='store_false', default=True.

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Jul 1, 2016

    The patch needs tests and documentation.

    Ok, I'll update it later.

    • parser.add_argument('--no-ensure-ascii', action='store_true', default=False,
      I'd go with action='store_false', default=True.

    If I'm not misreading your comment, this will change the original behavior, right? (because options.no_ensure_ascii will be set to True by default)

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Aug 28, 2016

    Added doc and test.

    @rhettinger rhettinger self-assigned this Aug 28, 2016
    @serhiy-storchaka
    Copy link
    Member

    Test fails on non-utf8 locale.

    $ LC_ALL=en_US ./python -m test.regrtest -v -m test_no_ensure_ascii_flag test_json
    ...

    ======================================================================
    FAIL: test_no_ensure_ascii_flag (test.test_json.test_tool.TestTool)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 115, in test_no_ensure_ascii_flag
        self.assertEqual(out.splitlines(), b'"\\u6e2c\\u8a66"\n'.splitlines())
    AssertionError: Lists differ: [b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"'] != [b'"\\u6e2c\\u8a66"']

    First differing element 0:
    b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"'
    b'"\\u6e2c\\u8a66"'

    • [b'"\\u00e6\\u00b8\\u00ac\\u00e8\\u00a9\\u00a6"']
      + [b'"\\u6e2c\\u8a66"']

    ----------------------------------------------------------------------

    @legnaleurc
    Copy link
    Mannequin Author

    legnaleurc mannequin commented Aug 28, 2016

    1. Replaced non-ASCII literals to \uXXXX
    2. Removed failed assertion

    Test passed with LC_ALL=en_US and LC_ALL=en_US.UTF-8 .

    I've tried to use locale.getdefaultlocale(), but seems the output string will vary in different locales.

    @rhettinger rhettinger removed their assignment Aug 28, 2016
    @berkerpeksag
    Copy link
    Member

    If I'm not misreading your comment, this will change the original behavior, right?

    Assuming you also change

       ensure_ascii = not options.no_ensure_ascii

    to

       ensure_ascii = options.no_ensure_ascii

    no, it won't change the original behavior. That way you won't need to invert the value of options.no_ensure_ascii in line 37.

    @serhiy-storchaka
    Copy link
    Member

    The last change just sweeps a problem under a rug.

    For now json.tool never fails with valid data. But with the --no-ensure-ascii option it can fail when output a string not encodable with the locale encoding. All can work with common cases on common UTF-8 environment, but unexpectedly fail on nonstandard environment. It would be better to output encodable characters as is and represent unencodable characters with \uXXXX encoding.

    @bitdancer bitdancer added the 3.7 (EOL) end of life label Aug 16, 2017
    @methane
    Copy link
    Member

    methane commented Dec 6, 2019

    New changeset efefe25 by Inada Naoki (wim glenn) in branch 'master':
    bpo-27413: json.tool: Add --no-ensure-ascii option. (GH-17472)
    efefe25

    @methane methane added 3.9 only security fixes and removed 3.7 (EOL) end of life labels Dec 6, 2019
    @methane methane closed this as completed Dec 6, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants