New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pprint could use line continuation for long bytes literals #61732
Comments
Same as issue bpo-17150: >>> pprint.pprint({"a": b"\x00\xff" * 20})
{'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'} ... could be better formatted as: >>> pprint.pprint({"a": b"\x00\xff" * 20})
{'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00'
b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
b'\x00\xff\x00\xff\x00\xff'} |
Here's a patch. I needed to handle the fact that the repr of a single byte can be 1, 2 or 4 characters long and did not want to wrap in the middle of a byte representation. Note also that bytes literals require a continuation character. In the pathological case where the wrap size is smaller than the representation of a single byte, I chose to always print at least one byte per line. As an aside, I also replaced the str wrapping code's calls to len with the cached _len used in the rest of pprint.py |
oops, forgot to add some samples:
>>> pprint.pprint(b"\n\n\n\n\n\n", width=5)
b'\n'\
b'\n'\
b'\n'\
b'\n'\
b'\n'\
b'\n'
>>> pprint.pprint({"a": b"\x00\xff" * 20})
{'a': b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00'\
b'\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'\
b'\x00\xff\x00\xff\x00\xff'}
>>> pprint.pprint({"a": b"\x00\xff" * 20}, width=20)
{'a': b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'\
b'\x00\xff'}
>>> pprint.pprint(b'a\x00\n\\x00', width=20)
b'a\x00\n\\x00' |
I don't understand why you say that "bytes literals require a continuation character": >>> (b"x"
... b"y")
b'xy'
>>> [b"x"
... b"y"]
[b'xy'] I think the "len caching" is a misoptimization, it's useless here (most CPU time will be sent creating and wrapping the representation). As for the doc, the example would probably deserve to be a bit more "meaningful" :-) |
# repr some bytes:
>>> b = b"\x00\xff" * 5
>>> b
b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
>>> r = repr(b)
>>> r
"b'\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff\\x00\\xff'"
>>> eval(r)
b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
# hand-wrap it without the continuation character and it fails to eval (stuck the
>>> s = "b'\\x00\\xff\\x00\\xff\\x00'\nb'\\xff\\x00\\xff\\x00\\xff'"
>>> print(s)
b'\x00\xff\x00\xff\x00'
b'\xff\x00\xff\x00\xff'
>>> eval(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2
b'\xff\x00\xff\x00\xff'
^
SyntaxError: invalid syntax
# stick the continuation character in, and it evals properly
>>> s = s.replace("\n", "\\\n")
>>> print(s)
b'\x00\xff\x00\xff\x00'\
b'\xff\x00\xff\x00\xff'
>>> eval(s)
b'\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff'
|
Well, but eval works if you put parentheses as required by the grammar: >>> s = "(b'xy'\nb'za')"
>>> eval(s)
b'xyza' Yes, _str_parts and _bytes_parts should probably remain separate. It's the higher-level routine that would deserve sharing. |
Here's a new version. It looks more like the str_parts patch and uses parentheses instead of continuation as suggested. Sample output:
>>> pprint.pprint(b"\n\na\x00", width=1)
(b'\n'
b'\n'
b'a'
b'\x00') |
Ok, I hadn't noticed that pretty-printing a single string didn't add the parentheses as desired: >>> pprint.pprint("abcd " * 6, width=15)
'abcd abcd '
'abcd abcd '
'abcd abcd ' On the other hand, the added parentheses aren't needed when inside a container (line continuations will work without them). (and of course the same rules should apply to bytes objects :-)) |
Here is a patch based on current str formatting code (i.e. parenthesis are added only if needed, the space at the right is used more efficiently). It adds pprint support for bytes and bytearrays. Bytes are broken only at positions divisible by 4, so packed 32-bit ints are never broken. Examples: >>> pprint.pprint(bytes(range(128)))
(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13'
b'\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234567'
b'89:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{'
b'|}~\x7f')
>>> pprint.pprint({'abcdefgh': bytes(range(128))})
{'abcdefgh': b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
b' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ['
b'\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f'}
>>> pprint.pprint(bytearray(range(128)))
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
b' !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'
b'`abcdefghijklmnopqrstuvwxyz{|}~\x7f')
>>> pprint.pprint({'abcdefgh': bytearray(range(128))})
{'abcdefgh': bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b'
b'\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17'
b'\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123'
b'456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefg'
b'hijklmnopqrstuvwxyz{|}~\x7f')} |
New changeset 976de10bf731 by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: