Issue 21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal Representation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66156

classification

Title:	ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal Representation
Type:	enhancement	Stage:	resolved
Components:		Versions:	Python 3.5

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	ned.deily
Priority:	normal	Keywords:

Created on 2014-07-11 14:29 by Zero, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg222749 - (view)	Author: Stephen Paul Chappell (Zero)	Date: 2014-07-11 14:29
In the string module, the definition of whitespace is ' \t\n\r\v\f'. However, the representation of string.whitespace is ' \t\n\r\x0b\x0c'. Would it be terribly inconvenient to change the representation of '\x0b\x0c' to '\v\f'? The documentation at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals lists recognized escape sequences, but string represetations seem to diverge slightly from what is recognized. The same "problem" exists with the representation of bytes.
msg222788 - (view)	Author: Ned Deily (ned.deily) *	Date: 2014-07-11 19:59
I am not sure why the string reprs for FF and VT are not special-cased to \f and \v but they are not alone: \a (BEL) and \b (BS) are also not special-cased. My guess is that it was for performance reasons but perhaps someone with a longer memory can comment. As now implemented, supporting these special cases would add checks for each for every character being encoded, a critical path for many applications, and in most cases, CR, LF, and HT are much more likely to be encountered. I also don't see where this behavior is documented anywhere but it goes back a long way, probably to the earliest days of Python and, in general, Python does not make any promises about which of any valid representations for particular characters will be used. While, on the one hand, it would make some string reprs look cleaner, on the other hand there are downsides: the risks of breaking existing code that might depend on the long-standing behavior, the potential performance impact that would need to be measured and mitigated, and the cost to develop and test. In balance, I think the risks outweigh any benefit so I think we should not pursue this change. If others feel differently, feel free to reopen. In any case, thanks for the suggestion.

History
Date	User	Action	Args
2022-04-11 14:58:05	admin	set	github: 66156
2019-10-31 14:22:57	Zero	set	nosy: - Zero
2014-07-11 19:59:49	ned.deily	set	status: open -> closed versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4 nosy: + ned.deily messages: + msg222788 resolution: rejected stage: resolved
2014-07-11 14:29:40	Zero	create