Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing whitespace in json dump when using indent #60537

Closed
zachmathew mannequin opened this issue Oct 26, 2012 · 35 comments
Closed

Trailing whitespace in json dump when using indent #60537

zachmathew mannequin opened this issue Oct 26, 2012 · 35 comments
Assignees
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@zachmathew
Copy link
Mannequin

zachmathew mannequin commented Oct 26, 2012

BPO 16333
Nosy @rhettinger, @jcea, @pitrou, @ezio-melotti, @merwok, @bitdancer, @serhiy-storchaka
Files
  • fix_json_trailing_space_and_tests.patch
  • json_indent_separators_default.patch: Use appropriate separators by default
  • json_indent_separators_suggestion.patch: Suggest to use appropriate separators
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2012-11-28.22:47:48.106>
    created_at = <Date 2012-10-26.21:25:12.582>
    labels = ['type-feature', 'library', 'docs']
    title = 'Trailing whitespace in json dump when using indent'
    updated_at = <Date 2014-03-10.22:11:20.816>
    user = 'https://bugs.python.org/zachmathew'

    bugs.python.org fields:

    activity = <Date 2014-03-10.22:11:20.816>
    actor = 'python-dev'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2012-11-28.22:47:48.106>
    closer = 'ezio.melotti'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2012-10-26.21:25:12.582>
    creator = 'zach.mathew'
    dependencies = []
    files = ['27735', '28128', '28148']
    hgrepos = []
    issue_num = 16333
    keywords = ['patch']
    message_count = 35.0
    messages = ['173891', '173897', '174097', '174099', '174100', '176323', '176348', '176351', '176353', '176362', '176373', '176405', '176410', '176411', '176412', '176440', '176450', '176451', '176487', '176541', '176543', '176549', '176550', '176570', '176571', '176572', '176574', '176583', '176584', '176585', '176597', '176600', '176601', '176606', '213096']
    nosy_count = 10.0
    nosy_names = ['rhettinger', 'jcea', 'pitrou', 'techtonik', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'python-dev', 'serhiy.storchaka', 'zach.mathew']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16333'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @zachmathew
    Copy link
    Mannequin Author

    zachmathew mannequin commented Oct 26, 2012

    When using the indent option in json.JSONEncoder, extra trailing whitespace (preceding the newline) is added to list and dict items.

    For example:

    >>> import json
    >>> json.dumps(['foo', 'bar'], indent=1)
    '[\n "foo", \n "bar"\n]'

    Notice the blank space between "foo", and \n

    EXPECTED OUTPUT:

    '[\n "foo",\n "bar"\n]'

    @zachmathew zachmathew mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Oct 26, 2012
    @bitdancer
    Copy link
    Member

    Thanks for the report and patch.

    I think we'll need a fix for the C version, too (unless it doesn't have the bug).

    @merwok
    Copy link
    Member

    merwok commented Oct 29, 2012

    I think the patch adds tests to a mixin class used to test the Python and the C code, so if it passes then no fix is needed in C.

    @ezio-melotti
    Copy link
    Member

    It does, and the C accelerator doesn't seem to be used for this.
    The patch looks good to me, however:

    1. it now strips spaces even when explicitly specified in the separator (this might be ok though -- I don't see why someone would need them);
    2. I'm not sure if this can be considered a bug fix and applied to all branches or if it should go on 3.4 only;

    @zachmathew
    Copy link
    Mannequin Author

    zachmathew mannequin commented Oct 29, 2012

    To Ezio's first point ... yes, I was wondering if trailing whitespace should be left alone in the case of an explicitly defined separator. However, this behavior seems a little odd to me as I don't see a use case for it (happy to change the patch if there are differing opinions on this).

    Keep in mind that the patch does retain the *leading* white space (for both default and explicit separators) - only trailing whitespace is removed prior to newlines.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 24, 2012

    Nice conflict case for explaining why perfect API is not here.

    Because ', ' is now used as an item separator inline and separator for items that span multiple lines, we get trailing whitespace. If ',' is used, items will collide. To resolve this conflict democratically, a new option should be added, but..

    If it is a JSON module then what separator are we talking about in the first place? It is not CSV, other separators are not in specification, so why is it there?

    The only reason is to get the most compact representation. I doubt anybody uses any value except (',', ':'), so the compact=True should be sufficient.

    As for this patch - there should be no doubt that it should be applied to all branches. The logic works only when indent is in force and that already breaks custom separators by inserting indents and newlines. Other argument that nobody will do JSON parsing with whitespace analysis, and even if they do - they'll still need to implement workaround for newlines.

    @serhiy-storchaka
    Copy link
    Member

    I don't think the code should fixed. You can use separators=(',', ': ') with indentation. Perhaps the documentation should contains such suggestion.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 25, 2012

    Would you mind providing some counter-arguments?

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 25, 2012

    Trailing whitespace produce visual warnings in diff comparison tools. If you have to read docs on how fix every piece of code that produces readable JSON to avoid this warnings, it is a very-very bad user experience.

    The argument that by default the output with indent option should be human-friendly. Don't you agree with that?

    @serhiy-storchaka
    Copy link
    Member

    What you think about default separators will be (', ', ': ') if indent is None and (',', ': ') if indent is not None? This will allow indent options be human-friendly and keep the flexibility to specify arbitrary separators if they needed.

    @serhiy-storchaka
    Copy link
    Member

    The proposed patch changes default *separators* value to (',', ': ') if *indent* is not None.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 26, 2012

    ',' makes lists less readable, directly the opposite of what the *indent* option is for. The *separators* variable is a insufficient solution, because it was not designed to work with indents.

    Therefore the original solution to strip trailing space when indent is active is a perfect intuitive default.

    @serhiy-storchaka
    Copy link
    Member

    Patch updated. Changing note about YAML is not needed, JSON lefts YAML-compatible even with identation.

    @serhiy-storchaka
    Copy link
    Member

    ',' makes lists less readable, directly the opposite of what the *indent* option is for.

    ',' used by default only when indentation used. It increases readability.

    @serhiy-storchaka
    Copy link
    Member

    Of course, this is a new feature and should be only in 3.4.

    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Nov 26, 2012
    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 26, 2012

    ',' used by default only when indentation used. It increases readability.

    Do you mean that when indentation is used, the separator only appears on line ends? Otherwise I can see how,is,that,more readable, than, that.

    Of course, this is a new feature and should be only in 3.4.

    No of course. =) Trailing whitespace is a usability bug. It doesn't affect anything. Produced JSON stays JSON.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 26, 2012

    I agree with Serhiy, this should be going in 3.4 only. The reason is that people might rely on exact output and it's not nice to break their code in a bugfix release.

    @ezio-melotti ezio-melotti self-assigned this Nov 26, 2012
    @serhiy-storchaka
    Copy link
    Member

    For older Python we need add in in the documentation the suggestion to use "separators=(',', ': ')" when indentation used.

    @ezio-melotti
    Copy link
    Member

    Do you mean that when indentation is used, the separator only appears on line ends?

    Apparently: ./python -c 'from json import dumps; print(dumps([[[1,2,3]]*3]*3, indent=2))'

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 28, 2012

    The reason is that people might rely on exact output and it's not nice to break their code in a bugfix release.

    This code has a very bad smell. Between supporting people who did that and teaching them a lesson I choose the latter, but I bet the situation like you describe either didn't exist or easily fixable on their side. If it's not fixable, then there is always a virtualenv and previous versions. That's about sympathy.

    Now about technical side of conservative development. There is no promise of binary compatibility for pretty-printed data. Python never made pretty-prints a serialization format. If you insist that people rely on this behavior, let's document it, because at least on this tracker there are already two people who have questions about that.

    @bitdancer
    Copy link
    Member

    "Code smell" and "Easily fixable on their side" are not an arguments that apply here. We have a strong backward compatibility policy that strives not to break working code in bug fix releases. And yes, this means that there are sometimes bugs that we only fix in feature releases.

    @techtonik
    Copy link
    Mannequin

    techtonik mannequin commented Nov 28, 2012

    The problem with policy and 'common sense' is that not everybody can feel that 'common sense', especially when there is no time to go deep into the issue. Policy is a quick and good 42 no matter that the matter is. The problem that it is also a filter, which puts a barrier in front of all reasonable arguments. You have either to transform yourself to live behind the barrier or to leave.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which adds a suggestion to use appropriate separators with indent. It also use they in Lib/json/tool.py.

    I suggest this patch only for old Python, up to 3.3. For 3.4 this is not needed if my previous suggestion will be accepted.

    @ezio-melotti
    Copy link
    Member

    There is no promise of binary compatibility for pretty-printed data.

    This was the reason that made me consider backporting this on the other branches. After all I expect this feature to be used from the terminal, while printing JSON in a human-friendly format and in other situations were trailing spaces would have little or no importance.
    OTOH people might be used pretty-printed JSON in tests to get a better a diff for example, and changing that in a debug release might be annoying.

    IOW the annoyance of having trailing spaces if it doesn't get fixed evens out the annoyance of having a possibly unwanted change of behavior if it does get fixed.

    Serhiy patches look good to me (modulo a couple of minor typos), so I will probably apply them as soon as I get a chance.

    @bitdancer
    Copy link
    Member

    Yes, I think the risk of breaking doctests (and breaking them in a very mysterious way...trailing spaces) is high enough that we shouldn't backport the fix, unfortunately.

    @merwok
    Copy link
    Member

    merwok commented Nov 28, 2012

    I agree with RDM.

    @serhiy-storchaka
    Copy link
    Member

    Please left 2.7, 3.2 and 3.3 for documentation changes (the second patch).

    @merwok merwok added the docs Documentation in the Doc dir label Nov 28, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 28, 2012

    New changeset 78bad589f205 by Ezio Melotti in branch '2.7':
    bpo-16333: document a way to get rid of trailing whitespace when indent is used.
    http://hg.python.org/cpython/rev/78bad589f205

    New changeset 2a5b183ac3cd by Ezio Melotti in branch '3.2':
    bpo-16333: document a way to get rid of trailing whitespace when indent is used.
    http://hg.python.org/cpython/rev/2a5b183ac3cd

    New changeset 9d6706b6b482 by Ezio Melotti in branch '3.3':
    bpo-16333: merge with 3.2.
    http://hg.python.org/cpython/rev/9d6706b6b482

    New changeset 8b30a764b58d by Ezio Melotti in branch 'default':
    bpo-16333: null merge with 3.3.
    http://hg.python.org/cpython/rev/8b30a764b58d

    New changeset e63ac05ccfa8 by Ezio Melotti in branch 'default':
    bpo-16333: use (",", ": ") as default separator when indent is specified to avoid trailing whitespace. Patch by Serhiy Storchaka.
    http://hg.python.org/cpython/rev/e63ac05ccfa8

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 28, 2012

    New changeset e7919cf9b5e5 by Ezio Melotti in branch 'default':
    bpo-16333: fix example in docstring.
    http://hg.python.org/cpython/rev/e7919cf9b5e5

    @ezio-melotti
    Copy link
    Member

    I committed the patches leaving out the json.tool changes.
    I will commit those as part of bpo-16476.
    Thanks Serhiy for the patches!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 29, 2012

    New changeset 1e3b01c52aee by Ezio Melotti in branch 'default':
    bpo-16333: add Misc/NEWS entry for e63ac05ccfa8.
    http://hg.python.org/cpython/rev/1e3b01c52aee

    @merwok
    Copy link
    Member

    merwok commented Nov 29, 2012

    Ezio, I noticed in the check-in mail that you added the note before the description of the separators argument (specifically, the descrition of what it does when it’s a tuple). Maybe it would flow best to move the note after that part?

    Also, I don’t think this is important enought that it warrants a note directive. Regular text would have looked good to me.

    Kudos for the patch otherwise; I love it when we can at least give workarounds and recipes in stable versions’ docs.

    @ezio-melotti
    Copy link
    Member

    I noticed in the check-in mail that you added the note before
    the description of the separators argument

    The note is just after the description of the indent argument, because it's relevant only when a indent value is specified.

    Regular text would have looked good to me.

    I considered this, but I think it's ok. I built the doc and the note is not too bad. The other two alternatives were: 1) use a normal paragraph, but that would have broken the flow, since each paragraph describes a different arg; 2) write it in the same paragraph of "indent", but that would have made it less visible.

    FTR I also closed the related issues bpo-16476 and bpo-16549.

    @merwok
    Copy link
    Member

    merwok commented Nov 29, 2012

    All right. :)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 10, 2014

    New changeset bb43e8e05a7c by R David Murray in branch 'default':
    whatsnew: json dump-with-indent whitespace change (bpo-16333).
    http://hg.python.org/cpython/rev/bb43e8e05a7c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants