classification
Title: print adding extra bytes in hex above x7F
Type: behavior Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Artificial, ammar2, brett.cannon
Priority: normal Keywords:

Created on 2019-10-03 01:34 by Artificial, last changed 2019-10-03 16:55 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
Screenshot from 2019-10-02 20-31-50.png Artificial, 2019-10-03 01:34 Proof
Messages (6)
msg353796 - (view) Author: Artificial (Artificial) Date: 2019-10-03 01:34
Any hex str of value above \x7F causes an extra byte to printed.
msg353799 - (view) Author: Artificial (Artificial) Date: 2019-10-03 01:37
python3 -c "print('\x7F')" > test.txt && xxd test.txt
00000000: 7f0a                                     ..                               

python3 -c "print('\x80')" > test.txt && xxd test.txt
00000000: c280 0a                                  ...
msg353800 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2019-10-03 01:42
If you're trying to get raw bytes, you need to use

    print(b'\x80')

what's happening right now is that the '\x80' is treated as a unicode code point (see https://docs.python.org/3/howto/unicode.html#the-string-type), and when Python goes to print it, it gets encoded to the raw underlying bytes. Which, in the default encoding of utf-8 requires the extra byte.

>>> '\x80'.encode()
b'\xc2\x80'
msg353803 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2019-10-03 01:53
However print() called with non-str argument would firstly call str() on it, which is most likely not what reporter wanted:

>>> print(b'\x80')
b'\x80'
>>> str(b"\x80")
"b'\\x80'"
>>> print(str(b"\x80"))
b'\x80'

$ python -c "print(b'\x80')" > test.txt
$ xxd test.txt
00000000: 6227 5c78 3830 270a                      b'\x80'.


Proper solution is to write to files opened in binary mode, which in case of stdout and stderr means to use sys.stdout.buffer and sys.stderr.buffer:

>>> sys.stdout.buffer.write(b"\x80")
�1

$ python -c "import sys; sys.stdout.buffer.write(b'\x80')" > test.txt
$ xxd test.txt
00000000: 80                                       .
msg353805 - (view) Author: Artificial (Artificial) Date: 2019-10-03 02:05
Thanks, Arfrever

Seems unnecessarily complicated for what worked in Python2.
msg353866 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-10-03 16:55
@Artificial please see the various blog posts and explanations about why the clear separation between bytes and text came to be and thus this change isn't "unnecessarily complicated".
History
Date User Action Args
2019-10-03 16:55:47brett.cannonsetnosy: + brett.cannon
messages: + msg353866
2019-10-03 02:05:25Artificialsetmessages: + msg353805
2019-10-03 01:53:47Arfreversetnosy: + Arfrever
messages: + msg353803
2019-10-03 01:42:31ammar2setstatus: open -> closed

nosy: + ammar2
messages: + msg353800

resolution: not a bug
stage: resolved
2019-10-03 01:37:47Artificialsetmessages: + msg353799
2019-10-03 01:34:08Artificialcreate