classification
Title: pprint long non-printable bytes as hexdump
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, pitrou, serhiy.storchaka, techtonik
Priority: normal Keywords: patch

Created on 2013-04-29 18:19 by serhiy.storchaka, last changed 2015-12-20 11:41 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pprint_bytes_hex.patch serhiy.storchaka, 2013-04-29 18:19 review
pprint_bytes_hex_2.patch serhiy.storchaka, 2013-05-04 11:53 Output a hexdump as a comment review
Messages (14)
msg188081 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-29 18:19
Here is a patch with which pprint formats long bytes objects which contain non-ascii or non-printable bytes as a hexdump.

Inspired by Antoine's wish (http://permalink.gmane.org/gmane.comp.python.ideas/20329).
msg188084 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-04-29 18:34
A couple of comments:
 1) A separate function might be better.  I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).
 2) I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short).  If a separate function is used as suggested in 1) this won't be a problem.  Using the hex_codec might be another option.
msg188086 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-29 18:55
Yes, I think a separate function would be better. There's another issue for pprint() of bytes with line continuations:

http://bugs.python.org/issue17530
msg188144 - (view) Author: anatoly techtonik (techtonik) Date: 2013-04-30 08:51
Some issues:

1. the hex converting logic doesn't belong to base64 module - there is no chance a person without StackOverflow access can find it
2. i'd put issue17862 first as a dependency for this one, because proposed itertools.chunks() can be further optimized (chunking endless sequences with constant memory overhead, pypy specific speedups, etc)

I like that this is not over-engineered. In my hexdump module I got too involved with problems of parsing/producing full dumps in a way compatible with Python 2/3. So I have to postpone my own user story until finally I run out of time.

Probably hexdump.dump() returning string will make it a useful API for the primary user story.
hexdump.dumpgen() as a line generator with 16 hexadecimal bytes delimited by space should cover all other use cases.
msg188346 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-04 09:42
>  1) A separate function might be better.  I think this kind of output would be more useful while inspecting individual byte objects, rather than having it for arbitrary byte objects (that might be inside other containers).

I don't think the general hexdump() function is worth to including in the stdlib. It should have too many options (How many bytes display in one line? How group hexdigits? What replacemant character for non-printables? Whether or not to display addresses? Whether or not to display chars? What are delimiters between hexdigits and chars, address and hexdigits? What are line prefix and suffix? How display last incomplete line? How display first incomplete line?) and this makes it complicated. An application which outputs a hexdump on more rich device (a html file or a ANSI-colored terminal) needs advanced options.

However a simple specialized code can be used for special purposes, i.e. internally in the pprint module. I don't see how it can be reused and don't interested in a general function.

>  2) I don't know if the output of pprint is supposed to be eval()uable, but I don't like too much the base64.b16decode(...).replace(' ', '') in the output (especially if the byte objects are short).  If a separate function is used as suggested in 1) this won't be a problem.  Using the hex_codec might be another option.

An alternative option is first output a bytes object as is (perhaps splitting it on multiple line as in issue17530) and then output a hexdamp as a comment.

[b'\x7fELF\x01\x01\x01\x00\x00\n'
 b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'
 # 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
 # 02 00 03 00 01                                  | .....
 ]
msg188348 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-04 10:27
Here is an alternative patch which outputs a bytes literal and a hexdump as a comment.
msg188352 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-04 11:17
Le samedi 04 mai 2013 à 09:42 +0000, Serhiy Storchaka a écrit :
> 
> However a simple specialized code can be used for special purposes,
> i.e. internally in the pprint module. I don't see how it can be reused
> and don't interested in a general function.

I don't understand how it would be useful in the pprint module if it
can't be useful as a general function. The general intent is the same:
print something in a "nice" way. Just the nice way is different
depending on the situations: if my bytes object is simply a bunch of
HTTP headers, I don't want to have a hexdump.

> An alternative option is first output a bytes object as is (perhaps
> splitting it on multiple line as in issue17530) and then output a
> hexdamp as a comment.
> 
> [b'\x7fELF\x01\x01\x01\x00\x00\n'
>  b'\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01'
>  # 7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
>  # 02 00 03 00 01                                  | .....
>  ]

This won't work very nicely in smaller display widths. You'll need too
many lines to represent a bytes object.
msg188354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-04 12:05
> I don't understand how it would be useful in the pprint module if it
> can't be useful as a general function.

How can it be used besides pprint/pformat functions?

>  Just the nice way is different
> depending on the situations: if my bytes object is simply a bunch of
> HTTP headers, I don't want to have a hexdump.

Then perhaps a new parameter for pprint/pformat needed (hex=True?). I think printing integers in hexadecimal can sometimes be useful too.

> This won't work very nicely in smaller display widths. You'll need too
> many lines to represent a bytes object.

This is a nature of hexdumps. Every byte requires 4+ characters (or 3+ if group hexdigits tighter).
msg188355 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-04 12:14
> > I don't understand how it would be useful in the pprint module if it
> > can't be useful as a general function.
> 
> How can it be used besides pprint/pformat functions?

I don't understand your question. Do you never print some data at the
command-line prompt? Or even as part of small test programs?

> Then perhaps a new parameter for pprint/pformat needed (hex=True?). I
> think printing integers in hexadecimal can sometimes be useful too.

Passing type-specific parameters to pprint/pformat sounds like a bad
idea to me. And I don't think you'd want to print *all* integers as hex.

> > This won't work very nicely in smaller display widths. You'll need too
> > many lines to represent a bytes object.
> 
> This is a nature of hexdumps. Every byte requires 4+ characters (or 3+
> if group hexdigits tighter).

Which is why the proposal doesn't fit well with pprint/pformat.
msg188360 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-05-04 12:46
The idea is that the output of pprint should be something like (once #17530 is applied):
>>> pprint.pprint(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00'
 b'\x00\x00\x00\x00\x02\x00\x03\x00\x01')

whereas the output of hexdump can be something like:
pprint.hexdump(b'\x7fELF\x01\x01\x01\x00\x00\n\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00\x01')
7F 45 4C 46 01 01 01 00 00 0A 00 00 00 00 00 00 | .ELF............
02 00 03 00 01                                  | .....

hexdump() could accept some additional args too if required, but otherwise I don't think the details are so important as long as it produces something readable for a human.
msg188371 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-04 15:41
> I don't understand your question. Do you never print some data at the
> command-line prompt? Or even as part of small test programs?

To be honest, I very rarely even use pprint. I'm too lazy to import it. If I want to quickly get a hexdump, I use something like `' '.join('%02X'%i for i in data)`. It is shorter than `import pprint; pprint.hexdump(data)`. For a small program most likely the standard hexdump() will not be enough.

> Passing type-specific parameters to pprint/pformat sounds like a bad
> idea to me.

Agree. Of course it would be better to automatically determine a "nice" display (use hexdump only for large non-printable bytes).

> And I don't think you'd want to print *all* integers as hex.

If you want to print bytes in hex, why not ints and floats? ;)  In fact I don't want to print data as hex, so shut up.

> Which is why the proposal doesn't fit well with pprint/pformat.

Perhaps I misunderstood your wish. I'm not against consider pprint as a black box, which does all good magic inside by default. The use of this feature does not require anything from the users and does not impose obligations on the maintainers. But I'm not interested in a separate function.
msg188372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-04 15:46
Le samedi 04 mai 2013 à 15:41 +0000, Serhiy Storchaka a écrit :
> > Which is why the proposal doesn't fit well with pprint/pformat.
> 
> Perhaps I misunderstood your wish. I'm not against consider pprint as
> a black box, which does all good magic inside by default. The use of
> this feature does not require anything from the users and does not
> impose obligations on the maintainers. But I'm not interested in a
> separate function.

The problem is the "good magic" will depend on the situation. Really, I
don't want a hexdump of a HTTP message :-)
Which is why there should be a separate function for those *wishing* a
hexdump.
msg189683 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-20 18:20
Oh, I forgot about bytes.fromhex(). This of course looks better than base64.b16decode((...).replace(' ', '')).
msg256763 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-20 11:39
Withdrawn in favor of issue17530.
History
Date User Action Args
2015-12-20 11:41:15serhiy.storchakasetresolution: fixed -> rejected
2015-12-20 11:39:21serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg256763

stage: patch review -> resolved
2013-05-20 18:20:04serhiy.storchakasetmessages: + msg189683
2013-05-04 15:46:40pitrousetmessages: + msg188372
2013-05-04 15:41:30serhiy.storchakasetmessages: + msg188371
2013-05-04 12:46:43ezio.melottisetmessages: + msg188360
2013-05-04 12:14:42pitrousetmessages: + msg188355
2013-05-04 12:05:21serhiy.storchakasetmessages: + msg188354
2013-05-04 11:53:23serhiy.storchakasetfiles: + pprint_bytes_hex_2.patch
2013-05-04 11:52:59serhiy.storchakasetfiles: - pprint_bytes_hex_2.patch
2013-05-04 11:17:24pitrousetmessages: + msg188352
2013-05-04 10:27:44serhiy.storchakasetfiles: + pprint_bytes_hex_2.patch

messages: + msg188348
2013-05-04 09:42:58serhiy.storchakasetmessages: + msg188346
2013-04-30 08:51:11techtoniksetmessages: + msg188144
2013-04-29 18:55:48pitrousetmessages: + msg188086
2013-04-29 18:35:13fdrakesetnosy: - fdrake
2013-04-29 18:34:55ezio.melottisetnosy: + ezio.melotti
messages: + msg188084
2013-04-29 18:19:29serhiy.storchakacreate