Message 283504 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nneonneo
Recipients	martin.panter, ncoghlan, nneonneo, terry.reedy
Date	2016-12-17.18:24:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1481999049.76.0.0901294904638.issue28927@psf.upfronthosting.co.za>
In-reply-to

Content
I see your point, Nick. Can I offer a counterpoint? Most of the string parsers operate only on relatively short inputs, like numbers. Numbers in particular are rarely written with inner spaces, so it makes sense not to ignore internal whitespaces. On the other hand, hexadecimal data can be very long, and is often formatted with spaces and newlines. For example, the default output of `xxd -p`, a format quite suitable for copy-paste, looks like this: cffaedfe07000001030000800200000015000000d8080000858021000000 000019000000480000005f5f504147455a45524f00000000000000000000 000000000000000001000000000000000000000000000000000000000000 000000000000000000000000000019000000180300005f5f544558540000 0000000000000000000000000100000000909d0100000000000000000000 It would be desirable to write something like blob = bytes.fromhex(''' cffaedfe07000001030000800200000015000000d8080000858021000000 000019000000480000005f5f504147455a45524f00000000000000000000 000000000000000001000000000000000000000000000000000000000000 000000000000000000000000000019000000180300005f5f544558540000 0000000000000000000000000100000000909d0100000000000000000000 ''') and not have to worry about sticking in some whitespace remover, like this: blob = bytes.fromhex(''.join(''' cffaedfe07000001030000800200000015000000d8080000858021000000 000019000000480000005f5f504147455a45524f00000000000000000000 000000000000000001000000000000000000000000000000000000000000 000000000000000000000000000019000000180300005f5f544558540000 0000000000000000000000000100000000909d0100000000000000000000 '''.split())) or removing the newlines in the source code, which impacts readability. Similar kinds of whitespaced output (sometimes with spaces between octets, words or dwords, sometimes with tabs between 8-16 byte groups, sometimes with newlines between groups, etc.) can be found in the wild and from the "hex" clipboard output from various applications. We can already have newlines and other whitespace with base64, which is in principle quite similar: blob = base64.b64decode(''' z/rt/gcAAAEDAACAAgAAABUAAADYCAAAhYAhAAAAAAAZAAAASAAAAF9fUEFHRVpFUk8AAAAAAAAAAAAA AAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZAAAAGAMAAF9fVEVYVAAA AAAAAAAAAAAAAAAAAQAAAACQnQEAAAAAAAAAAAAAAAAAkJ0BAAAAAAcAAAAFAAAACQAAAAAAAABfX3Rl eHQAAAAAAAAAAAAAX19URVhUAAAAAAAAAAAAAAALAAABAAAARCF5AQAAAAAACwAACAAAAAAAAAAAAAAA ''') so I think it makes sense to support other whitespaces in fromhex. I'm happy to reconsider if there's a strong argument against adding this convenience.

I see your point, Nick. Can I offer a counterpoint?

Most of the string parsers operate only on relatively short inputs, like numbers. Numbers in particular are rarely written with inner spaces, so it makes sense not to ignore internal whitespaces.

On the other hand, hexadecimal data can be very long, and is often formatted with spaces and newlines. For example, the default output of `xxd -p`, a format quite suitable for copy-paste, looks like this:

cffaedfe07000001030000800200000015000000d8080000858021000000
000019000000480000005f5f504147455a45524f00000000000000000000
000000000000000001000000000000000000000000000000000000000000
000000000000000000000000000019000000180300005f5f544558540000
0000000000000000000000000100000000909d0100000000000000000000

It would be desirable to write something like

blob = bytes.fromhex('''
cffaedfe07000001030000800200000015000000d8080000858021000000
000019000000480000005f5f504147455a45524f00000000000000000000
000000000000000001000000000000000000000000000000000000000000
000000000000000000000000000019000000180300005f5f544558540000
0000000000000000000000000100000000909d0100000000000000000000
''')

and not have to worry about sticking in some whitespace remover, like this:

blob = bytes.fromhex(''.join('''
cffaedfe07000001030000800200000015000000d8080000858021000000
000019000000480000005f5f504147455a45524f00000000000000000000
000000000000000001000000000000000000000000000000000000000000
000000000000000000000000000019000000180300005f5f544558540000
0000000000000000000000000100000000909d0100000000000000000000
'''.split()))

or removing the newlines in the source code, which impacts readability. 

Similar kinds of whitespaced output (sometimes with spaces between octets, words or dwords, sometimes with tabs between 8-16 byte groups, sometimes with newlines between groups, etc.) can be found in the wild and from the "hex" clipboard output from various applications.

We can already have newlines and other whitespace with base64, which is in principle quite similar:

blob = base64.b64decode('''
z/rt/gcAAAEDAACAAgAAABUAAADYCAAAhYAhAAAAAAAZAAAASAAAAF9fUEFHRVpFUk8AAAAAAAAAAAAA
AAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZAAAAGAMAAF9fVEVYVAAA
AAAAAAAAAAAAAAAAAQAAAACQnQEAAAAAAAAAAAAAAAAAkJ0BAAAAAAcAAAAFAAAACQAAAAAAAABfX3Rl
eHQAAAAAAAAAAAAAX19URVhUAAAAAAAAAAAAAAALAAABAAAARCF5AQAAAAAACwAACAAAAAAAAAAAAAAA
''')

so I think it makes sense to support other whitespaces in fromhex. I'm happy to reconsider if there's a strong argument against adding this convenience.

History
Date	User	Action	Args
2016-12-17 18:24:09	nneonneo	set	recipients: + nneonneo, terry.reedy, ncoghlan, martin.panter
2016-12-17 18:24:09	nneonneo	set	messageid: <1481999049.76.0.0901294904638.issue28927@psf.upfronthosting.co.za>
2016-12-17 18:24:09	nneonneo	link	issue28927 messages
2016-12-17 18:24:09	nneonneo	create