This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: print statement using \x results in improper and extra bytes
Type: behavior Stage: resolved
Components: Versions: Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Nathan Benson, steven.daprano
Priority: normal Keywords:

Created on 2018-08-19 23:24 by Nathan Benson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg323773 - (view) Author: Nathan Benson (Nathan Benson) Date: 2018-08-19 23:24
While writing some shellcode I uncovered an unusual bug where Python 3 seems to print out incorrect (and extra) hex bytes using the print statement with \x. Needless to say I was pulling my hair out trying to figure out why my shellcode wasn’t working.  Python 2 behaves as expected.

I haven't tested the latest version of Python 3, but all the versions prior to that seem to have the bug.  I’ve been able to reproduce the bug in Ubuntu Linux and on my Mac.

An example printing "\xfd\x84\x04\x08" I expect to get back "fd 84 04 08", but Python 3 seems to add bytes beginning with c2 and c3 and tosses in random bytes.

For the purpose of these demonstrations:

  Akame:~ jfa$ python2 --version
  Python 2.7.15

  Akame:~ jfa$ python3 --version
  Python 3.7.0


Here is Python 2 operating as expected:

Akame:~ jfa$ python2 -c 'print("\xfd\x84\x04\x08")' | hexdump -C
00000000  fd 84 04 08 0a                                    |.....|
00000005


Here is Python 3 with the exact same print statement:

Akame:~ jfa$ python3 -c 'print("\xfd\x84\x04\x08")' | hexdump -C
00000000  c3 bd c2 84 04 08 0a                              |.......|
00000007

There are 6 bytes not 4 and where did the c3, bd, and c2 come from?

Playing around with it a little bit more it seems like the problem arises when you are printing bytes that start with a-f or 8 or 9:

Here is a-f:

Akame:~ jfa$ for b in {a..f}; do echo "\x${b}0"; python3 -c "print(\"\x${b}0\")" | hexdump -C; done
\xa0
00000000  c2 a0 0a                                          |...|
00000003
\xb0
00000000  c2 b0 0a                                          |...|
00000003
\xc0
00000000  c3 80 0a                                          |...|
00000003
\xd0
00000000  c3 90 0a                                          |...|
00000003
\xe0
00000000  c3 a0 0a                                          |...|
00000003
\xf0
00000000  c3 b0 0a                                          |...|
00000003


Here is 0-9 (notice everything is fine until 8):

Akame:~ jfa$ for b in {0..9}; do echo "\x${b}0"; python3 -c "print(\"\x${b}0\")" | hexdump -C; done
\x00
00000000  00 0a                                             |..|
00000002
\x10
00000000  10 0a                                             |..|
00000002
\x20
00000000  20 0a                                             | .|
00000002
\x30
00000000  30 0a                                             |0.|
00000002
\x40
00000000  40 0a                                             |@.|
00000002
\x50
00000000  50 0a                                             |P.|
00000002
\x60
00000000  60 0a                                             |`.|
00000002
\x70
00000000  70 0a                                             |p.|
00000002
\x80
00000000  c2 80 0a                                          |...|
00000003
\x90
00000000  c2 90 0a                                          |...|
00000003



Here are the same tests with Python 2:

Akame:~ jfa$ for b in {a..f}; do echo "\x${b}0"; python2 -c "print(\"\x${b}0\")" | hexdump -C; done
\xa0
00000000  a0 0a                                             |..|
00000002
\xb0
00000000  b0 0a                                             |..|
00000002
\xc0
00000000  c0 0a                                             |..|
00000002
\xd0
00000000  d0 0a                                             |..|
00000002
\xe0
00000000  e0 0a                                             |..|
00000002
\xf0
00000000  f0 0a                                             |..|
00000002


Akame:~ jfa$ for b in {0..9}; do echo "\x${b}0"; python2 -c "print(\"\x${b}0\")" | hexdump -C; done
\x00
00000000  00 0a                                             |..|
00000002
\x10
00000000  10 0a                                             |..|
00000002
\x20
00000000  20 0a                                             | .|
00000002
\x30
00000000  30 0a                                             |0.|
00000002
\x40
00000000  40 0a                                             |@.|
00000002
\x50
00000000  50 0a                                             |P.|
00000002
\x60
00000000  60 0a                                             |`.|
00000002
\x70
00000000  70 0a                                             |p.|
00000002
\x80
00000000  80 0a                                             |..|
00000002
\x90
00000000  90 0a                                             |..|
00000002


As you can see Python 2 works as expected and Python 3, when printing using \x[a-f08], seem to cause the byte to be replaced with a c2 or c3 and another byte of data.
msg323774 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-08-20 00:20
You wrote:

> There are 6 bytes not 4 and where did the c3, bd, and c2 come from?

In Python 2, strings are byte strings, in Python 3, strings by default are Unicode text strings. You are seeing the UTF-8 representation of the text string.

py> "\xfd\x84\x04\x08".encode('utf-8')
b'\xc3\xbd\xc2\x84\x04\x08'

So the behaviour in Python 3 is correct and not a bug, it has just changed (intentionally) from Python 2.

Googling may help you find more about this:

https://duckduckgo.com/?q=python3+write+bytes+to+stdout
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78618
2018-08-20 00:20:07steven.dapranosetstatus: open -> closed

nosy: + steven.daprano
messages: + msg323774

resolution: not a bug
stage: resolved
2018-08-19 23:24:40Nathan Bensoncreate