Issue 17530: pprint could use line continuation for long bytes literals

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61732

classification

Title:	pprint could use line continuation for long bytes literals
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:	19104 19105	Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	Pam.McANulty, fdrake, pitrou, python-dev, serhiy.storchaka
Priority:	low	Keywords:	patch

Created on 2013-03-23 19:39 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bytes_pprint.patch	Pam.McANulty, 2013-04-15 13:52		review
bytes_pprint2.patch	Pam.McANulty, 2013-04-18 01:47		review
pprint_bytes.patch	serhiy.storchaka, 2015-02-14 13:10		review

Messages (10)
msg185078 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-03-23 19:39
Same as issue #17150: >>> pprint.pprint({"a": b"\x00\xff" * 20}) {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'} ... could be better formatted as: >>> pprint.pprint({"a": b"\x00\xff" * 20}) {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00' b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff' b'\x00\xff\x00\xff\x00\xff'}
msg186988 - (view)	Author: Pam McA'Nulty (Pam.McANulty) *	Date: 2013-04-15 13:52
Here's a patch. I needed to handle the fact that the repr of a single byte can be 1, 2 or 4 characters long and did not want to wrap in the middle of a byte representation. Note also that bytes literals require a continuation character. In the pathological case where the wrap size is smaller than the representation of a single byte, I chose to always print at least one byte per line. As an aside, I also replaced the str wrapping code's calls to len with the cached _len used in the rest of pprint.py
msg186989 - (view)	Author: Pam McA'Nulty (Pam.McANulty) *	Date: 2013-04-15 13:55
oops, forgot to add some samples: >>> pprint.pprint(b"\n\n\n\n\n\n", width=5) b'\n'\ b'\n'\ b'\n'\ b'\n'\ b'\n'\ b'\n' >>> pprint.pprint({"a": b"\x00\xff" * 20}) {'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00'\ b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'\ b'\x00\xff\x00\xff\x00\xff'} >>> pprint.pprint({"a": b"\x00\xff" * 20}, width=20) {'a': b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'\ b'\x00\xff'} >>> pprint.pprint(b'a\x00\n\\x00', width=20) b'a\x00\n\\x00'
msg186993 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-04-15 14:11
I don't understand why you say that "bytes literals require a continuation character": >>> (b"x" ... b"y") b'xy' >>> [b"x" ... b"y"] [b'xy'] I think the "len caching" is a misoptimization, it's useless here (most CPU time will be sent creating and wrapping the representation). Also perhaps it would be nice to refactor things a bit, since we have both _str_parts and _bytes_parts used in exactly the same way (but that can also be done later). As for the doc, the example would probably deserve to be a bit more "meaningful" :-)
msg186995 - (view)	Author: Pam McA'Nulty (Pam.McANulty) *	Date: 2013-04-15 14:28
- eval expects bytes to have a continuation character and test_str_wrap did an eval check so I figured test_bytes_wrap should as well: # repr some bytes: >>> b = b"\x00\xff" * 5 >>> b b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff' >>> r = repr(b) >>> r "b'\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff'" >>> eval(r) b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff' # hand-wrap it without the continuation character and it fails to eval (stuck the >>> s = "b'\\x00\\xff\\x00\\xff\\x00'\nb'\\xff\\x00\\xff\\x00\\xff'" >>> print(s) b'\x00\xff\x00\xff\x00' b'\xff\x00\xff\x00\xff' >>> eval(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 2 b'\xff\x00\xff\x00\xff' ^ SyntaxError: invalid syntax # stick the continuation character in, and it evals properly >>> s = s.replace("\n", "\\\n") >>> print(s) b'\x00\xff\x00\xff\x00'\ b'\xff\x00\xff\x00\xff' >>> eval(s) b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff' - I agree about the len v _len, but figured this wasn't all that foolish a consistency issue (i.e. the rest of pprint.py used _len) - I also wanted to refactor _str_parts and _bytes_parts, but couldn't decide on the best course. I was favoring a helper function to run the common loop since the two "if issubclass..." calls were so different and parameterizing the differences felt like it would obfuscate things too much. - I also agree on the doc. I figured I'd see if there weren't any hidden surprises with the patch before I worked on better doc.
msg186998 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-04-15 15:05
Well, but eval works if you put parentheses as required by the grammar: >>> s = "(b'xy'\nb'za')" >>> eval(s) b'xyza' Yes, _str_parts and _bytes_parts should probably remain separate. It's the higher-level routine that would deserve sharing. Also, perhaps the other wrapping routines (for dict, list...) could get the same treatment.
msg187218 - (view)	Author: Pam McA'Nulty (Pam.McANulty) *	Date: 2013-04-18 01:47
Here's a new version. It looks more like the str_parts patch and uses parentheses instead of continuation as suggested. Sample output: >>> pprint.pprint(b"\n\na\x00", width=1) (b'\n' b'\n' b'a' b'\x00')
msg187456 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-04-20 20:37
Ok, I hadn't noticed that pretty-printing a single string didn't add the parentheses as desired: >>> pprint.pprint("abcd " * 6, width=15) 'abcd abcd ' 'abcd abcd ' 'abcd abcd ' On the other hand, the added parentheses aren't needed when inside a container (line continuations will work without them). (and of course the same rules should apply to bytes objects :-))
msg235968 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-02-14 13:10
Here is a patch based on current str formatting code (i.e. parenthesis are added only if needed, the space at the right is used more efficiently). It adds pprint support for bytes and bytearrays. Bytes are broken only at positions divisible by 4, so packed 32-bit ints are never broken. Examples: >>> pprint.pprint(bytes(range(128))) (b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13' b'\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()+,-./01234567' b'89:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{' b'\|}~\x7f') >>> pprint.pprint({'abcdefgh': bytes(range(128))}) {'abcdefgh': b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f' b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f' b' !"#$%&\'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[' b'\\]^_`abcdefghijklmnopqrstuvwxyz{\|}~\x7f'} >>> pprint.pprint(bytearray(range(128))) bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f' b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f' b' !"#$%&\'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_' b'`abcdefghijklmnopqrstuvwxyz{\|}~\x7f') >>> pprint.pprint({'abcdefgh': bytearray(range(128))}) {'abcdefgh': bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b' b'\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17' b'\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()+,-./0123' b'456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefg' b'hijklmnopqrstuvwxyz{\|}~\x7f')}
msg239159 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-03-24 17:23
New changeset 976de10bf731 by Serhiy Storchaka in branch 'default': Issue #17530: pprint now wraps long bytes objects and bytearrays. https://hg.python.org/cpython/rev/976de10bf731

History
Date	User	Action	Args
2022-04-11 14:57:43	admin	set	github: 61732
2015-03-24 17:23:48	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2015-03-24 17:23:30	python-dev	set	nosy: + python-dev messages: + msg239159
2015-03-24 17:10:21	serhiy.storchaka	set	assignee: serhiy.storchaka
2015-02-14 13:10:45	serhiy.storchaka	set	files: + pprint_bytes.patch nosy: + serhiy.storchaka messages: + msg235968
2014-12-20 18:50:23	serhiy.storchaka	set	dependencies: + pprint produces invalid output for long strings, pprint doesn't use all width versions: + Python 3.5, - Python 3.4
2013-09-27 15:09:22	serhiy.storchaka	set	stage: patch review
2013-04-20 20:37:55	pitrou	set	messages: + msg187456
2013-04-18 01:47:55	Pam.McANulty	set	files: + bytes_pprint2.patch messages: + msg187218
2013-04-15 15:05:57	pitrou	set	messages: + msg186998
2013-04-15 14:28:53	Pam.McANulty	set	messages: + msg186995
2013-04-15 14:11:03	pitrou	set	messages: + msg186993
2013-04-15 13:55:09	Pam.McANulty	set	messages: + msg186989
2013-04-15 13:52:23	Pam.McANulty	set	files: + bytes_pprint.patch nosy: + Pam.McANulty messages: + msg186988 keywords: + patch
2013-03-23 19:39:37	pitrou	create