Author ncoghlan
Recipients martin.panter, ncoghlan, nneonneo, terry.reedy
Date 2016-12-17.15:19:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1481987993.03.0.205883345943.issue28927@psf.upfronthosting.co.za>
In-reply-to
Content
My recollection is that fromhex() ignores spaces to account for particularly common ways of formatting hex numbers as space separated groups:

    "CAFE F00D"
    "CAFEF00D CAFEF00D CAFEF00D"
    "CA FE F0 0D"
    etc

Those show up even in structured hexadecimal data (like hex editor output and memory dumps in log files).

That recollection is supported by the example and specification of specifically "[0-9a-fA-F ]" in PEP 358 (https://www.python.org/dev/peps/pep-0358/) that guided Georg Brandl's initial implementation in http://bugs.python.org/issue1669379

Generally speaking, the type level string parsers *aren't* permissive when it comes to their whitespace handling - if they allow whitespace at all, it's usually only at the beginning or end, where it gets ignored (hence the PEP for 3.6 to allow underscores in both numeric literals and in the numeric constructors).

=====================
>>> float(" 1.0")
1.0
>>> float("1.0 ")
1.0
>>> float("1 0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '1 0'
>>> float("1. 0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '1. 0'
>>> float("1 .0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '1 .0'
=====================

The general technique to strip whitespace from *any* input data before handing it to a constructor relies on str.translate:

=====================
>>> import string
>>> _del_whitespace = str.maketrans('', '', string.whitespace)
>>> def clean_whitespace(text):
...     return text.translate(_del_whitespace)
... 
>>> clean_whitespace('CA FE\nF0\t0D')
'CAFEF00D'
=====================

(http://stackoverflow.com/questions/3739909/how-to-strip-all-whitespace-from-string also points out `"".join(text.split())` as a clever one liner for a similar outcome)

So I'm inclined to advise *against* making any changes here - the apparent benefit is more a symptom of the fact that there isn't an immediately obvious spelling of "strip all whitespace, including that between other characters, from this string" that can be readily applied as an initial filter on the incoming data.

(However, I do sometimes wonder if it would make sense to offer a "str.stripall()" that defaulted to removing all whitespace, rather than treating this purely as a particular use case for str.translate)
History
Date User Action Args
2016-12-17 15:19:53ncoghlansetrecipients: + ncoghlan, terry.reedy, nneonneo, martin.panter
2016-12-17 15:19:53ncoghlansetmessageid: <1481987993.03.0.205883345943.issue28927@psf.upfronthosting.co.za>
2016-12-17 15:19:53ncoghlanlinkissue28927 messages
2016-12-17 15:19:51ncoghlancreate