Define a binary output formatting mini-language for *.hex() #66579

ncoghlan · 2014-09-10T23:55:01Z

BPO	22385
Nosy	@warsaw, @gpshead, @ncoghlan, @abalkin, @ericvsmith, @mrh1997, @tirkarthi
PRs	bpo-22385: Support output separators in hex methods. #13578
Dependencies	bpo-9951: introduce bytes.hex method (also for bytearray and memoryview)

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-10-21.03:21:09.364>
created_at = <Date 2014-09-10.23:55:01.218>
labels = ['interpreter-core', 'type-feature', '3.9']
title = 'Define a binary output formatting mini-language for *.hex()'
updated_at = <Date 2019-10-21.03:21:09.352>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2019-10-21.03:21:09.352>
actor = 'gregory.p.smith'
assignee = 'none'
closed = True
closed_date = <Date 2019-10-21.03:21:09.364>
closer = 'gregory.p.smith'
components = ['Interpreter Core']
creation = <Date 2014-09-10.23:55:01.218>
creator = 'ncoghlan'
dependencies = ['9951']
files = []
hgrepos = []
issue_num = 22385
keywords = ['patch']
message_count = 18.0
messages = ['226733', '226746', '226748', '226749', '226992', '226993', '242941', '292663', '292671', '292699', '292710', '292871', '292900', '342888', '343527', '343910', '343966', '343987']
nosy_count = 10.0
nosy_names = ['barry', 'gregory.p.smith', 'ncoghlan', 'belopolsky', 'eric.smith', 'gotgenes', 'Arfrever', 'Christian H', 'mrh1997', 'xtreak']
pr_nums = ['13578']
priority = 'normal'
resolution = 'fixed'
stage = 'commit review'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue22385'
versions = ['Python 3.9']

ncoghlan · 2014-09-10T23:55:01Z

Inspired by the discussion in bpo-9951, I believe it would be appropriate to extend the default handling of the "x" and "X" format characters to accept arbitrary bytes-like objects. The processing of these characters would be as follows:

"x": display a-f as lowercase digits
"X": display A-F as uppercase digits
"#": includes 0x prefix
".precision": chunks output, placing a space after every <precision> bytes
",": uses a comma as the separator, rather than a space

Output order would match binascii.hexlify()

Examples:

format(b"xyz", "x") -> '78797a'
format(b"xyz", "X") -> '78797A'
format(b"xyz", "#x") -> '0x78797a'

format(b"xyz", ".1x") -> '78 79 7a'
format(b"abcdwxyz", ".4x") -> '61626364 7778797a'
format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a'

format(b"xyz", ",.1x") -> '78,79,7a'
format(b"abcdwxyz", ",.4x") -> '61626364,7778797a'
format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a'

This approach makes it easy to inspect binary data, with the ability to inject regular spaces or commas to improved readability. Those are the basic features needed to support debugging.

Anything more complicated than that, and we're starting to want something more like the struct module.

ericvsmith · 2014-09-11T07:23:31Z

I think this would need to be implemented by adding bytes.__format__. I can't think of a way to make it work on bytes-like objects in general.

vstinner · 2014-09-11T07:25:21Z

".precision": chunks output, placing a space after every <precision> bytes

I dislike this option. There is already "%.<precision>s" in Python 2 and Python 3 (and printf of the C language) which truncates the string.

If you need such special output, please write your own function.

ericvsmith · 2014-09-11T07:34:35Z

I'm not particularly wild about the .precision syntax either, but I think the feature is generally useful.

Adding bytes.__format__ is exactly what "special output for bytes" _is_, as far as format() is concerned.

Another option would be to invent a new format specification for bytes. There's no reason it needs to follow the same syntax as for str, int, etc., except for ease of remembering the syntax, and some code reuse.

For example, although it's insane, you could do:

format(b'abcdwxyz', 'use_spaces,grouping=4,add_prefix')
  -> '0x61626364 0x7778797a'

ncoghlan · 2014-09-17T10:56:55Z

Retitled the issue and made it depend on bpo-9951.

I now think it's better to tackle this more like strftime and have a method that accepts of particular custom formatting mini-language (in this case, the new hex() methods proposed in bpo-9951), and then also support that mini-language in the __format__() method.

It would likely need a PEP to decide on the exact details of the formatting.

ncoghlan · 2014-09-17T11:01:06Z

python-ideas post with a sketch of a possible mini-language: https://mail.python.org/pipermail/python-ideas/2014-September/029352.html

ncoghlan · 2015-05-12T05:00:00Z

Reviewing the items I had flagged as dependencies of bpo-22555 for personal tracking purposes, I suggest we defer further consideration of this idea to 3.6 after folks have had a chance to get some experience with the basic bytes.hex() method.

ncoghlan · 2017-05-01T13:34:12Z

Copying the amended proposal from that python-ideas thread into here:

Start with a leading base format character (chosen to be orthogonal to the default format characters):

"h": lowercase hex
"H": uppercase hex
"A": ASCII (using "." for unprintable & extended ASCII)

format(b"xyz", "A") -> 'xyz'
format(b"xyz", "h") -> '78797a'
format(b"xyz", "H") -> '78797A'

Followed by a separator and "chunk size":

format(b"xyz", "h 1") -> '78 79 7a'
format(b"abcdwxyz", "h 4") -> '61626364 7778797a'

format(b"xyz", "h,1") -> '78,79,7a'
format(b"abcdwxyz", "h,4") -> '61626364,7778797a'

format(b"xyz", "h:1") -> '78:79:7a'
format(b"abcdwxyz", "h:4") -> '61626364:7778797a'

In the "h" and "H" cases, allow requesting a preceding "0x" on the chunks:

format(b"xyz", "h#") -> '0x78797a'
format(b"xyz", "h# 1") -> '0x78 0x79 0x7a'
format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a'

In the thread, I suggested the section before the format character would use the standard string formatting rules (alignment, fill character, width, precision), but I now think that would be ambiguous and confusing, and would be better left as a post-processing step on the rendered text.

gpshead · 2017-05-01T16:27:11Z

Based on the ideas thread it isn't obvious that chunk size means "per byte". I suggest either specifying the number of base digits per delimiter. Or using a name other than chunk that indicates it means bytes.

If we're going to do this, it should also be done for octal formatting (the 'o' code) for consistency.

Also, per the python-ideas thread, via parameters to the .hex() method on bytes/bytearray/memoryview.

I'm inclined to leave 'A' printable-ascii formatting out. Or at least consider that it could also work on unicode str.

ericvsmith · 2017-05-01T20:07:05Z

The Unix "od" command pretty much has all of the possibilities covered.

https://linuxconfig.org/od-1-manual-page

Although "named characters" might be going a bit far. Float, too.

ncoghlan · 2017-05-02T02:10:58Z

Minimalist proposal:

    def hex(self, *, bytes_per_group=None, delimiter=" "):
        """B.hex() -> string of hex digits
        B.hex(bytes_per_group=N) -> hex digits in groups separated by *delimeter*
    
        Create a string of hexadecimal numbers from a bytes object::

        >>> b'\xb9\x01\xef'.hex()
        'b901ef'
        >>> b'\xb9\x01\xef'.hex(bytes_per_group=1)
        'b9 01 ef'
        """

Alternatively, the grouping could be by digit rather than by byte:

    def hex(self, *, group_digits=None, delimiter=" "):
        """B.hex() -> string of hex digits
        B.hex(group_digits=N) -> hex digits in groups separated by *delimeter*
    
        Create a string of hexadecimal numbers from a bytes object::

        >>> b'\xb9\x01\xef'.hex()
        'b901ef'
        >>> b'\xb9\x01\xef'.hex(group_digits=2)
        'b9 01 ef'
        """

One potential advantage of the group_digits approach is that it could be fairly readily adapted to the hex/oct/bin builtins (although if we did that, it would make the lack of a "dec" builtin for decimal formatting a bit weird)

mrh1997 · 2017-05-03T09:54:27Z

regarding the proposal for mini format languages for bytes (msg292663):
Wouldn't it be more consistent if the format specifiers are identical to the one of int's (see https://docs.python.org/3/library/string.html#format-specification-mini-language).

I.e. "X" / "x" for hex, "o" for octal, "d" for decimal, "b" for binary, "c" for character (=default). Only 'A' need to be added for printing only ascii characters.

Furthermore I cannot see in how far the format spec in http://bugs.python.org/issue22385#msg292663 ("h#,1") is more intuitive than in http://bugs.python.org/issue22385#msg226733 ("#,.4x"), which looks like the existing minilang.

Why does Python need a new format mini lang, if the existing one provides most of the requirements. As developer it is already hard to memorize the details of the existing minilang. Ideally I do not need to learn a similar but different one for bytes...

ncoghlan · 2017-05-03T13:19:05Z

Re-using an existing minilanguage to mean something completely different wouldn't be a good idea.

Whether or not we should add any bytes specific features for this at all is also still an open question, as one of the points raised in the latest python-ideas thread is that this may be better handled as a general purpose string splitting method that breaks the string up into fixed size units, which can then be rejoined with an arbitrary delimeter. For example:

    >>> digit_groups = b'\xb9\x01\xef'.hex().splitgroups(2)
    >>> ' '.join(digit_groups)
    'b9 01 ef'

gpshead · 2019-05-20T06:31:34Z

FYI - micropython added an optional 'sep' second argument to binascii.hexlify() that is a single character separator to insert between every two hex digits.

given the bpo-9951 .hex() methods we have everywhere (and corresponding .fromhex), binascii.hexlify is almost a legacy API. (but micropython doesn't have those methods yet). one key difference? hexlify returns the hex value as a bytes rather than a str.

just adding a couple of parameters to the hex() method seems fine. a separator string and a number of bytes to separate.

yet another minilanguage would be overkill. and confusing in the face of the existing numeric formatting mini language ability to insert , or _ separators every four spaces ala f'{value:_x}'.

gpshead · 2019-05-26T00:57:41Z

Given that we have f-strings, I don't think a format mini language makes as much sense. My PR adds support for separators to the .hex() methods (and to binascii.hexlify) via a parameter. Extending beyond what MicroPython already does in its binascii implementation (a single sep parameter).

gpshead · 2019-05-29T18:47:04Z

New changeset 0c2f930 by Gregory P. Smith in branch 'master':
bpo-22385: Support output separators in hex methods. (bpo-13578)
0c2f930

tirkarthi · 2019-05-30T10:32:55Z

This change seems to have created some compile time warnings : https://buildbot.python.org/all/#/builders/103/builds/2544/steps/3/logs/warnings__6_

Python/pystrhex.c:18:45: warning: passing argument 1 of ‘PyObject_Size’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
Python/pystrhex.c:60:27: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘Py_ssize_t’ {aka ‘const int’} [-Wsign-compare]
Python/pystrhex.c:90:29: warning: ‘sep_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Python/pystrhex.c:90:29: warning: ‘sep_char’ may be used uninitialized in this function [-Wmaybe-uninitialized]

gpshead · 2019-05-30T17:48:10Z

thanks, i'll take care of them.

ncoghlan added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 10, 2014

ncoghlan changed the title ~~Allow 'x' and 'X' to accept bytes objects in string formatting~~ Allow 'x' and 'X' to accept bytes-like objects in string formatting Sep 10, 2014

ncoghlan changed the title ~~Allow 'x' and 'X' to accept bytes-like objects in string formatting~~ Define a binary output formatting mini-language for *.hex() Sep 17, 2014

gpshead added the 3.7 (EOL) end of life label May 1, 2017

gpshead added 3.8 only security fixes type-feature A feature request or enhancement and removed 3.7 (EOL) end of life labels May 20, 2019

gpshead added 3.9 only security fixes and removed 3.8 only security fixes labels Oct 21, 2019

gpshead closed this as completed Oct 21, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a binary output formatting mini-language for *.hex() #66579

Define a binary output formatting mini-language for *.hex() #66579

ncoghlan commented Sep 10, 2014

ncoghlan commented Sep 10, 2014

ericvsmith commented Sep 11, 2014

vstinner commented Sep 11, 2014

ericvsmith commented Sep 11, 2014

ncoghlan commented Sep 17, 2014

ncoghlan commented Sep 17, 2014

ncoghlan commented May 12, 2015

ncoghlan commented May 1, 2017

gpshead commented May 1, 2017

ericvsmith commented May 1, 2017

ncoghlan commented May 2, 2017

mrh1997 mannequin commented May 3, 2017

ncoghlan commented May 3, 2017

gpshead commented May 20, 2019

gpshead commented May 26, 2019

gpshead commented May 29, 2019

tirkarthi commented May 30, 2019

gpshead commented May 30, 2019

Define a binary output formatting mini-language for *.hex() #66579

Define a binary output formatting mini-language for *.hex() #66579

Comments

ncoghlan commented Sep 10, 2014

ncoghlan commented Sep 10, 2014

ericvsmith commented Sep 11, 2014

vstinner commented Sep 11, 2014

ericvsmith commented Sep 11, 2014

ncoghlan commented Sep 17, 2014

ncoghlan commented Sep 17, 2014

ncoghlan commented May 12, 2015

ncoghlan commented May 1, 2017

gpshead commented May 1, 2017

ericvsmith commented May 1, 2017

ncoghlan commented May 2, 2017

mrh1997 mannequin commented May 3, 2017

ncoghlan commented May 3, 2017

gpshead commented May 20, 2019

gpshead commented May 26, 2019

gpshead commented May 29, 2019

tirkarthi commented May 30, 2019

gpshead commented May 30, 2019