Issue 17868: pprint long non-printable bytes as hexdump

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/62068

classification

Title:	pprint long non-printable bytes as hexdump
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.4

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, pitrou, serhiy.storchaka, techtonik
Priority:	normal	Keywords:	patch

Created on 2013-04-29 18:19 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pprint_bytes_hex.patch	serhiy.storchaka, 2013-04-29 18:19		review
pprint_bytes_hex_2.patch	serhiy.storchaka, 2013-05-04 11:53	Output a hexdump as a comment	review

Messages (14)
msg188081 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-04-29 18:19
Here is a patch with which pprint formats long bytes objects which contain non-ascii or non-printable bytes as a hexdump. Inspired by Antoine's wish (http://permalink.gmane.org/gmane.comp.python.ideas/20329).
msg188084 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-04-29 18:34
A couple of comments: 1) A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers). 2) I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option.
msg188086 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-04-29 18:55
Yes, I think a separate function would be better. There's another issue for pprint() of bytes with line continuations: http://bugs.python.org/issue17530
msg188144 - (view)	Author: anatoly techtonik (techtonik)	Date: 2013-04-30 08:51
Some issues: 1. the hex converting logic doesn't belong to base64 module - there is no chance a person without StackOverflow access can find it 2. i'd put issue17862 first as a dependency for this one, because proposed itertools.chunks() can be further optimized (chunking endless sequences with constant memory overhead, pypy specific speedups, etc) I like that this is not over-engineered. In my hexdump module I got too involved with problems of parsing/producing full dumps in a way compatible with Python 2/3. So I have to postpone my own user story until finally I run out of time. Probably hexdump.dump() returning string will make it a useful API for the primary user story. hexdump.dumpgen() as a line generator with 16 hexadecimal bytes delimited by space should cover all other use cases.
msg188346 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-04 09:42
> 1) A separate function might be better. I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers). I don't think the general hexdump() function is worth to including in the stdlib. It should have too many options (How many bytes display in one line? How group hexdigits? What replacemant character for non-printables? Whether or not to display addresses? Whether or not to display chars? What are delimiters between hexdigits and chars, address and hexdigits? What are line prefix and suffix? How display last incomplete line? How display first incomplete line?) and this makes it complicated. An application which outputs a hexdump on more rich device (a html file or a ANSI-colored terminal) needs advanced options. However a simple specialized code can be used for special purposes, i.e. internally in the pprint module. I don't see how it can be reused and don't interested in a general function. > 2) I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short). If a separate function is used as suggested in 1) this won't be a problem. Using the hex_codec might be another option. An alternative option is first output a bytes object as is (perhaps splitting it on multiple line as in issue17530) and then output a hexdamp as a comment. [b'\x7fELF\x01\x01\x01\x00\x00\n' b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01' # 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 \| .ELF............ # 02 00 03 00 01 \| ..... ]
msg188348 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-04 10:27
Here is an alternative patch which outputs a bytes literal and a hexdump as a comment.
msg188352 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-05-04 11:17
Le samedi 04 mai 2013 à 09:42 +0000, Serhiy Storchaka a écrit : > > However a simple specialized code can be used for special purposes, > i.e. internally in the pprint module. I don't see how it can be reused > and don't interested in a general function. I don't understand how it would be useful in the pprint module if it can't be useful as a general function. The general intent is the same: print something in a "nice" way. Just the nice way is different depending on the situations: if my bytes object is simply a bunch of HTTP headers, I don't want to have a hexdump. > An alternative option is first output a bytes object as is (perhaps > splitting it on multiple line as in issue17530) and then output a > hexdamp as a comment. > > [b'\x7fELF\x01\x01\x01\x00\x00\n' > b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01' > # 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 \| .ELF............ > # 02 00 03 00 01 \| ..... > ] This won't work very nicely in smaller display widths. You'll need too many lines to represent a bytes object.
msg188354 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-04 12:05
> I don't understand how it would be useful in the pprint module if it > can't be useful as a general function. How can it be used besides pprint/pformat functions? > Just the nice way is different > depending on the situations: if my bytes object is simply a bunch of > HTTP headers, I don't want to have a hexdump. Then perhaps a new parameter for pprint/pformat needed (hex=True?). I think printing integers in hexadecimal can sometimes be useful too. > This won't work very nicely in smaller display widths. You'll need too > many lines to represent a bytes object. This is a nature of hexdumps. Every byte requires 4+ characters (or 3+ if group hexdigits tighter).
msg188355 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-05-04 12:14
> > I don't understand how it would be useful in the pprint module if it > > can't be useful as a general function. > > How can it be used besides pprint/pformat functions? I don't understand your question. Do you never print some data at the command-line prompt? Or even as part of small test programs? > Then perhaps a new parameter for pprint/pformat needed (hex=True?). I > think printing integers in hexadecimal can sometimes be useful too. Passing type-specific parameters to pprint/pformat sounds like a bad idea to me. And I don't think you'd want to print all integers as hex. > > This won't work very nicely in smaller display widths. You'll need too > > many lines to represent a bytes object. > > This is a nature of hexdumps. Every byte requires 4+ characters (or 3+ > if group hexdigits tighter). Which is why the proposal doesn't fit well with pprint/pformat.
msg188360 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2013-05-04 12:46
The idea is that the output of pprint should be something like (once #17530 is applied): >>> pprint.pprint(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01') (b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00' b'\x00\x00\x00\x00\x02\x00\x03\x00\x01') whereas the output of hexdump can be something like: pprint.hexdump(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01') 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 \| .ELF............ 02 00 03 00 01 \| ..... hexdump() could accept some additional args too if required, but otherwise I don't think the details are so important as long as it produces something readable for a human.
msg188371 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-04 15:41
> I don't understand your question. Do you never print some data at the > command-line prompt? Or even as part of small test programs? To be honest, I very rarely even use pprint. I'm too lazy to import it. If I want to quickly get a hexdump, I use something like `' '.join('%02X'%i for i in data)`. It is shorter than `import pprint; pprint.hexdump(data)`. For a small program most likely the standard hexdump() will not be enough. > Passing type-specific parameters to pprint/pformat sounds like a bad > idea to me. Agree. Of course it would be better to automatically determine a "nice" display (use hexdump only for large non-printable bytes). > And I don't think you'd want to print all integers as hex. If you want to print bytes in hex, why not ints and floats? ;) In fact I don't want to print data as hex, so shut up. > Which is why the proposal doesn't fit well with pprint/pformat. Perhaps I misunderstood your wish. I'm not against consider pprint as a black box, which does all good magic inside by default. The use of this feature does not require anything from the users and does not impose obligations on the maintainers. But I'm not interested in a separate function.
msg188372 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-05-04 15:46
Le samedi 04 mai 2013 à 15:41 +0000, Serhiy Storchaka a écrit : > > Which is why the proposal doesn't fit well with pprint/pformat. > > Perhaps I misunderstood your wish. I'm not against consider pprint as > a black box, which does all good magic inside by default. The use of > this feature does not require anything from the users and does not > impose obligations on the maintainers. But I'm not interested in a > separate function. The problem is the "good magic" will depend on the situation. Really, I don't want a hexdump of a HTTP message :-) Which is why there should be a separate function for those wishing a hexdump.
msg189683 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-20 18:20
Oh, I forgot about bytes.fromhex(). This of course looks better than base64.b16decode((...).replace(' ', '')).
msg256763 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-12-20 11:39
Withdrawn in favor of issue17530.

History
Date	User	Action	Args
2022-04-11 14:57:45	admin	set	github: 62068
2015-12-20 11:41:15	serhiy.storchaka	set	resolution: fixed -> rejected
2015-12-20 11:39:21	serhiy.storchaka	set	status: open -> closed resolution: fixed messages: + msg256763 stage: patch review -> resolved
2013-05-20 18:20:04	serhiy.storchaka	set	messages: + msg189683
2013-05-04 15:46:40	pitrou	set	messages: + msg188372
2013-05-04 15:41:30	serhiy.storchaka	set	messages: + msg188371
2013-05-04 12:46:43	ezio.melotti	set	messages: + msg188360
2013-05-04 12:14:42	pitrou	set	messages: + msg188355
2013-05-04 12:05:21	serhiy.storchaka	set	messages: + msg188354
2013-05-04 11:53:23	serhiy.storchaka	set	files: + pprint_bytes_hex_2.patch
2013-05-04 11:52:59	serhiy.storchaka	set	files: - pprint_bytes_hex_2.patch
2013-05-04 11:17:24	pitrou	set	messages: + msg188352
2013-05-04 10:27:44	serhiy.storchaka	set	files: + pprint_bytes_hex_2.patch messages: + msg188348
2013-05-04 09:42:58	serhiy.storchaka	set	messages: + msg188346
2013-04-30 08:51:11	techtonik	set	messages: + msg188144
2013-04-29 18:55:48	pitrou	set	messages: + msg188086
2013-04-29 18:35:13	fdrake	set	nosy: - fdrake
2013-04-29 18:34:55	ezio.melotti	set	nosy: + ezio.melotti messages: + msg188084
2013-04-29 18:19:29	serhiy.storchaka	create