classification
Title: Trailing whitespace in json dump when using indent
Type: enhancement Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: eric.araujo, ezio.melotti, jcea, pitrou, python-dev, r.david.murray, rhettinger, serhiy.storchaka, techtonik, zach.mathew
Priority: normal Keywords: patch

Created on 2012-10-26 21:25 by zach.mathew, last changed 2014-03-10 22:11 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
fix_json_trailing_space_and_tests.patch zach.mathew, 2012-10-26 21:25 review
json_indent_separators_default.patch serhiy.storchaka, 2012-11-26 12:58 Use appropriate separators by default review
json_indent_separators_suggestion.patch serhiy.storchaka, 2012-11-28 15:58 Suggest to use appropriate separators review
Messages (35)
msg173891 - (view) Author: Zach Mathew (zach.mathew) Date: 2012-10-26 21:25
When using the indent option in json.JSONEncoder, extra trailing whitespace (preceding the newline) is added to list and dict items.

For example:

>>> import json
>>> json.dumps(['foo', 'bar'], indent=1)
'[\n "foo", \n "bar"\n]'

Notice the blank space between "foo", and \n

EXPECTED OUTPUT:

'[\n "foo",\n "bar"\n]'
msg173897 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-10-26 22:13
Thanks for the report and patch.

I think we'll need a fix for the C version, too (unless it doesn't have the bug).
msg174097 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-10-29 02:08
I think the patch adds tests to a mixin class used to test the Python and the C code, so if it passes then no fix is needed in C.
msg174099 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-29 02:55
It does, and the C accelerator doesn't seem to be used for this.
The patch looks good to me, however:
1) it now strips spaces even when explicitly specified in the separator (this might be ok though -- I don't see why someone would need them);
2) I'm not sure if this can be considered a bug fix and applied to all branches or if it should go on 3.4 only;
msg174100 - (view) Author: Zach Mathew (zach.mathew) Date: 2012-10-29 03:16
To Ezio's first point ... yes, I was wondering if trailing whitespace should be left alone in the case of an explicitly defined separator. However, this behavior seems a little odd to me as I don't see a use case for it (happy to change the patch if there are differing opinions on this).

Keep in mind that the patch does retain the *leading* white space (for both default and explicit separators) - only trailing whitespace is removed prior to newlines.
msg176323 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-24 23:12
Nice conflict case for explaining why perfect API is not here.

Because ', ' is now used as an item separator inline and separator for items that span multiple lines, we get trailing whitespace. If ',' is used, items will collide. To resolve this conflict democratically, a new option should be added, but..

If it is a JSON module then what separator are we talking about in the first place? It is not CSV, other separators are not in specification, so why is it there?

The only reason is to get the most compact representation. I doubt anybody uses any value except (',', ':'), so the `compact=True` should be sufficient.


As for this patch - there should be no doubt that it should be applied to all branches. The logic works only when indent is in force and that already breaks custom separators by inserting indents and newlines. Other argument that nobody will do JSON parsing with whitespace analysis, and even if they do - they'll still need to implement workaround for newlines.
msg176348 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-25 13:22
I don't think the code should fixed. You can use separators=(',', ': ') with indentation. Perhaps the documentation should contains such suggestion.
msg176351 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-25 13:32
Would you mind providing some counter-arguments?
msg176353 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-25 13:36
Trailing whitespace produce visual warnings in diff comparison tools. If you have to read docs on how fix every piece of code that produces readable JSON to avoid this warnings, it is a very-very bad user experience.

The argument that by default the output with indent option should be human-friendly. Don't you agree with that?
msg176362 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-25 15:15
What you think about default separators will be (', ', ': ') if indent is None and (',', ': ') if indent is not None? This will allow indent options be human-friendly and keep the flexibility to specify arbitrary separators if they needed.
msg176373 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-25 17:02
The proposed patch changes default *separators* value to (',', ': ') if *indent* is not None.
msg176405 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-26 09:25
',' makes lists less readable, directly the opposite of what the *indent* option is for. The *separators* variable is a insufficient solution, because it was not designed to work with indents.

Therefore the original solution to strip trailing space when indent is active is a perfect intuitive default.
msg176410 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-26 12:58
Patch updated. Changing note about YAML is not needed, JSON lefts YAML-compatible even with identation.
msg176411 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-26 13:01
> ',' makes lists less readable, directly the opposite of what the *indent* option is for.

',' used by default only when indentation used. It increases readability.
msg176412 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-26 13:02
Of course, this is a new feature and should be only in 3.4.
msg176440 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-26 19:08
> ',' used by default only when indentation used. It increases readability.

Do you mean that when indentation is used, the separator only appears on line ends? Otherwise I can see how,is,that,more readable, than, that.

> Of course, this is a new feature and should be only in 3.4.

No of course. =) Trailing whitespace is a usability bug. It doesn't affect anything. Produced JSON stays JSON.
msg176450 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-26 20:03
I agree with Serhiy, this should be going in 3.4 only. The reason is that people might rely on exact output and it's not nice to break their code in a bugfix release.
msg176451 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-26 20:10
For older Python we need add in in the documentation the suggestion to use "separators=(',', ': ')" when indentation used.
msg176487 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-27 19:50
> Do you mean that when indentation is used, the separator only appears on line ends?

Apparently: ./python -c 'from json import dumps; print(dumps([[[1,2,3]]*3]*3, indent=2))'
msg176541 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-28 12:53
> The reason is that people might rely on exact output and it's not nice to break their code in a bugfix release.

This code has a very bad smell. Between supporting people who did that and teaching them a lesson I choose the latter, but I bet the situation like you describe either didn't exist or easily fixable on their side. If it's not fixable, then there is always a virtualenv and previous versions. That's about sympathy.

Now about technical side of conservative development. There is no promise of binary compatibility for pretty-printed data. Python never made pretty-prints a serialization format. If you insist that people rely on this behavior, let's document it, because at least on this tracker there are already two people who have questions about that.
msg176543 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-11-28 14:09
"Code smell" and "Easily fixable on their side" are not an arguments that apply here.  We have a strong backward compatibility policy that strives not to break working code in bug fix releases.  And yes, this means that there are sometimes bugs that we only fix in feature releases.
msg176549 - (view) Author: anatoly techtonik (techtonik) Date: 2012-11-28 15:11
The problem with policy and 'common sense' is that not everybody can feel that 'common sense', especially when there is no time to go deep into the issue. Policy is a quick and good 42 no matter that the matter is. The problem that it is also a filter, which puts a barrier in front of all reasonable arguments. You have either to transform yourself to live behind the barrier or to leave.
msg176550 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-28 15:58
Here is a patch which adds a suggestion to use appropriate separators with indent. It also use they in Lib/json/tool.py.

I suggest this patch only for old Python, up to 3.3. For 3.4 this is not needed if my previous suggestion will be accepted.
msg176570 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-28 19:49
> There is no promise of binary compatibility for pretty-printed data.

This was the reason that made me consider backporting this on the other branches.  After all I expect this feature to be used from the terminal, while printing JSON in a human-friendly format and in other situations were trailing spaces would have little or no importance.
OTOH people might be used pretty-printed JSON in tests to get a better a diff for example, and changing that in a debug release might be annoying.

IOW the annoyance of having trailing spaces if it doesn't get fixed evens out the annoyance of having a possibly unwanted change of behavior if it does get fixed.

Serhiy patches look good to me (modulo a couple of minor typos), so I will probably apply them as soon as I get a chance.
msg176571 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-11-28 19:56
Yes, I think the risk of breaking doctests (and breaking them in a very mysterious way...trailing spaces) is high enough that we shouldn't backport the fix, unfortunately.
msg176572 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-11-28 20:35
I agree with RDM.
msg176574 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-28 21:12
Please left 2.7, 3.2 and 3.3 for documentation changes (the second patch).
msg176583 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-28 22:43
New changeset 78bad589f205 by Ezio Melotti in branch '2.7':
#16333: document a way to get rid of trailing whitespace when indent is used.
http://hg.python.org/cpython/rev/78bad589f205

New changeset 2a5b183ac3cd by Ezio Melotti in branch '3.2':
#16333: document a way to get rid of trailing whitespace when indent is used.
http://hg.python.org/cpython/rev/2a5b183ac3cd

New changeset 9d6706b6b482 by Ezio Melotti in branch '3.3':
#16333: merge with 3.2.
http://hg.python.org/cpython/rev/9d6706b6b482

New changeset 8b30a764b58d by Ezio Melotti in branch 'default':
#16333: null merge with 3.3.
http://hg.python.org/cpython/rev/8b30a764b58d

New changeset e63ac05ccfa8 by Ezio Melotti in branch 'default':
#16333: use (",", ": ") as default separator when indent is specified to avoid trailing whitespace.  Patch by Serhiy Storchaka.
http://hg.python.org/cpython/rev/e63ac05ccfa8
msg176584 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-28 22:45
New changeset e7919cf9b5e5 by Ezio Melotti in branch 'default':
#16333: fix example in docstring.
http://hg.python.org/cpython/rev/e7919cf9b5e5
msg176585 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-28 22:47
I committed the patches leaving out the json.tool changes.
I will commit those as part of #16476.
Thanks Serhiy for the patches!
msg176597 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-29 00:33
New changeset 1e3b01c52aee by Ezio Melotti in branch 'default':
#16333: add Misc/NEWS entry for e63ac05ccfa8.
http://hg.python.org/cpython/rev/1e3b01c52aee
msg176600 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-11-29 01:05
Ezio, I noticed in the check-in mail that you added the note before the description of the separators argument (specifically, the descrition of what it does when it’s a tuple).  Maybe it would flow best to move the note after that part?

Also, I don’t think this is important enought that it warrants a note directive.  Regular text would have looked good to me.

Kudos for the patch otherwise; I love it when we can at least give workarounds and recipes in stable versions’ docs.
msg176601 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-29 01:15
> I noticed in the check-in mail that you added the note before
> the description of the separators argument

The note is just after the description of the indent argument, because it's relevant only when a indent value is specified.

> Regular text would have looked good to me.

I considered this, but I think it's ok.  I built the doc and the note is not too bad.  The other two alternatives were: 1) use a normal paragraph, but that would have broken the flow, since each paragraph describes a different arg; 2) write it in the same paragraph of "indent", but that would have made it less visible.

FTR I also closed the related issues #16476 and #16549.
msg176606 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-11-29 02:16
All right. :)
msg213096 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-10 22:11
New changeset bb43e8e05a7c by R David Murray in branch 'default':
whatsnew: json dump-with-indent whitespace change (#16333).
http://hg.python.org/cpython/rev/bb43e8e05a7c
History
Date User Action Args
2014-03-10 22:11:20python-devsetmessages: + msg213096
2012-12-15 02:55:15jceasetnosy: + jcea
2012-11-29 02:16:00eric.araujosetmessages: + msg176606
2012-11-29 01:15:28ezio.melottisetmessages: + msg176601
2012-11-29 01:05:52eric.araujosetmessages: + msg176600
2012-11-29 00:33:38python-devsetmessages: + msg176597
2012-11-28 22:47:48ezio.melottisetstatus: open -> closed
resolution: fixed
messages: + msg176585

stage: patch review -> resolved
2012-11-28 22:45:35python-devsetmessages: + msg176584
2012-11-28 22:43:13python-devsetnosy: + python-dev
messages: + msg176583
2012-11-28 21:13:31eric.araujosetcomponents: + Documentation
versions: + Python 2.7, Python 3.2, Python 3.3
2012-11-28 21:12:56serhiy.storchakasetmessages: + msg176574
2012-11-28 20:35:44eric.araujosetmessages: + msg176572
versions: - Python 2.7, Python 3.2, Python 3.3
2012-11-28 19:56:25r.david.murraysetmessages: + msg176571
2012-11-28 19:49:54ezio.melottisetmessages: + msg176570
2012-11-28 15:58:50serhiy.storchakasetfiles: + json_indent_separators_suggestion.patch

messages: + msg176550
2012-11-28 15:11:08techtoniksetmessages: + msg176549
2012-11-28 14:09:37r.david.murraysetmessages: + msg176543
2012-11-28 12:53:27techtoniksetmessages: + msg176541
2012-11-27 19:50:43ezio.melottisetmessages: + msg176487
2012-11-26 20:10:00serhiy.storchakasetmessages: + msg176451
2012-11-26 20:04:48ezio.melottisetassignee: ezio.melotti
2012-11-26 20:03:23pitrousetmessages: + msg176450
2012-11-26 19:08:52techtoniksetmessages: + msg176440
versions: + Python 2.7, Python 3.2, Python 3.3
2012-11-26 13:02:51serhiy.storchakasetversions: - Python 2.7, Python 3.2, Python 3.3
nosy: + rhettinger, pitrou

messages: + msg176412

type: behavior -> enhancement
2012-11-26 13:01:33serhiy.storchakasetmessages: + msg176411
2012-11-26 12:58:09serhiy.storchakasetfiles: + json_indent_separators_default.patch

messages: + msg176410
2012-11-26 12:56:42serhiy.storchakasetfiles: - json_indent_separators_default.patch
2012-11-26 09:25:44techtoniksetmessages: + msg176405
2012-11-25 17:02:32serhiy.storchakasetfiles: + json_indent_separators_default.patch

messages: + msg176373
2012-11-25 15:25:08serhiy.storchakalinkissue16549 dependencies
2012-11-25 15:15:09serhiy.storchakasetmessages: + msg176362
2012-11-25 13:36:57techtoniksetmessages: + msg176353
2012-11-25 13:32:57techtoniksetmessages: + msg176351
2012-11-25 13:22:05serhiy.storchakasetmessages: + msg176348
2012-11-24 23:12:39techtoniksetnosy: + techtonik
messages: + msg176323
2012-11-15 11:25:35ezio.melottilinkissue16476 superseder
2012-10-29 03:16:47zach.mathewsetmessages: + msg174100
2012-10-29 02:55:58ezio.melottisetmessages: + msg174099
2012-10-29 02:08:34eric.araujosetnosy: + eric.araujo
messages: + msg174097
2012-10-26 23:22:27ezio.melottisetnosy: + ezio.melotti
2012-10-26 22:58:09serhiy.storchakasetnosy: + serhiy.storchaka
2012-10-26 22:13:49r.david.murraysetversions: - Python 2.6, Python 3.1, Python 3.5
nosy: + r.david.murray

messages: + msg173897

stage: patch review
2012-10-26 21:25:12zach.mathewcreate