This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: socket.recv(size, MSG_TRUNC) returns more than size bytes
Type: behavior Stage: needs patch
Components: Extension Modules Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Andrey Wagin, benjamin.peterson, berker.peksag, christian.heimes, martin.panter
Priority: high Keywords:

Created on 2015-08-25 11:28 by Andrey Wagin, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg249114 - (view) Author: Andrey Wagin (Andrey Wagin) Date: 2015-08-25 11:28
In [1]: import socket

In [2]: sks = socket.socketpair(socket.AF_UNIX, socket.SOCK_DGRAM)

In [3]: sks[1].send("asdfasdfsadfasdfsdfsadfsdfasdfsdfasdfsadfa")
Out[3]: 42

In [4]: sks[0].recv(1, socket.MSG_PEEK | socket.MSG_TRUNC)
Out[4]: 'a\x00\x00\x00\xc0\xbf8\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

recv() returns a buffer. The size of this buffer is equal to the size of transferred data, but only the first symbol was initialized. What is the idea of this behavior.

Usually recv(sk, NULL, 0, socket.MSG_PEEK | socket.MSG_TRUNC) is used to get a message size. What is the right way to get a message size in Python?
msg249121 - (view) Author: Andrey Wagin (Andrey Wagin) Date: 2015-08-25 13:21
sendto(4, "asdfasdfsadfasdfsdfsadfsdfasdfsd"..., 42, 0, NULL, 0) = 42
recvfrom(3, "a\0n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\2\0\0\0"..., 1, MSG_TRUNC, NULL, NULL) = 42

I think the exit code is interpreted incorrectly. In this case it isn't equal to the number of bytes received. Then python copies this number of bytes from the buffer with smaller size, so it may access memory which are not allocated or allocated by someone else.

valgrind detects this type of errors:
[avagin@localhost ~]$ cat sock.py 
import socket, os, sys

sks = socket.socketpair(socket.AF_UNIX, socket.SOCK_DGRAM)
pid = os.fork()
if pid == 0:
	sks[1].send("\0" * 4096)
	sys.exit(0)
sk = sks[0]
print sk.recv(1, socket.MSG_TRUNC )

[avagin@localhost ~]$ valgrind python sock.py
==25511== Memcheck, a memory error detector
==25511== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==25511== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==25511== Command: python sock.py
==25511== 
==25511== Syscall param write(buf) points to uninitialised byte(s)
==25511==    at 0x320B4F0940: __write_nocancel (in /usr/lib64/libc-2.20.so)
==25511==    by 0x320B478D2C: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.20.so)
==25511==    by 0x320B4794EE: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib64/libc-2.20.so)
==25511==    by 0x320B46EE68: fwrite (in /usr/lib64/libc-2.20.so)
==25511==    by 0x369CC90210: ??? (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CC85EAE: ??? (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CC681AB: PyFile_WriteObject (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CCE08F9: PyEval_EvalFrameEx (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CCE340F: PyEval_EvalCodeEx (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CCE3508: PyEval_EvalCode (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CCFC91E: ??? (in /usr/lib64/libpython2.7.so.1.0)
==25511==    by 0x369CCFDB41: PyRun_FileExFlags (in /usr/lib64/libpython2.7.so.1.0)
msg249127 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-08-25 15:22
Evidently, the recv code doesn't know anything about MSG_TRUNC, which causes it to do incorrect things when the output length is greater than the buffer length.
msg249214 - (view) Author: Andrey Wagin (Andrey Wagin) Date: 2015-08-26 20:22
There is the same behavior for python 3.4
>>> sks[1].send(b"asdfasdfsadfasdfsdfsadfsdfasdfsdfasdfsadfa")
42
>>> sks[0].recv(1, socket.MSG_PEEK | socket.MSG_TRUNC)
b'a\x00Nx\x94\x7f\x00\x00sadfasdfsdfsadfsdfasdfsdfasdfsadfa'
>>>
msg264343 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-27 03:38
As far as I know, passing MSG_TRUNC into recv() is Linux-specific. I guess the “right” portable way to get a message size is to know it in advance, or guess and expand the buffer if MSG_PEEK cannot return the whole message.

Andrey: I don’t think we are accessing _unallocated_ memory (which could crash Python). If you look at _PyBytes_Resize(), I think it correctly allocates the memory, and just leaves it uninitialized.

Some options:

* Document that arbitrary flags like Linux’s MSG_TRUNC not supported
* Limit the returned buffer to the original buffer size
* Raise an exception or warning if recv() returns more than the original buffer size
* Reject unsupported flags like MSG_TRUNC
* Initialize the expanded buffer with zeros
msg277427 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-26 14:59
MSG_TRUNC literally causes a buffer overflow. In the example sock_recv() and friends only allocate a buffer of size 1 on the heap. With MSG_TRUNC recv() ignores the maximum size and writes beyond the buffer. We cannot recover from a buffer overflow because the overflow might have damanged other data structures. Instead Python should detect the problem and forcefully abort() the process with Py_FatalError().
msg277429 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-26 15:31
Ah, I misunderstood MSG_TRUNC. It's not a buffer overflow. MSG_TRUNC does not write beyond the end of the buffer. In this example the libc function recv() writes two bytes into the buffer but returns a larger value than 2.

---
import socket
a, b = socket.socketpair(socket.AF_UNIX, socket.SOCK_DGRAM)
a.send(b'abcdefgh')
result = b.recv(2, socket.MSG_TRUNC)
print(len(result), result)
---
stdout: 2 b'ab'

To fix the wrong result of recv() with MSG_TRUNC, only resize when outlen < recvlen (line 3089).

To get the size of the message, you have to use recv_into() with a buffer.

---
a, b = socket.socketpair(socket.AF_UNIX, socket.SOCK_DGRAM)
a.send(b'abcdefgh')
msg = bytearray(2)
result = b.recv_into(msg, flags=socket.MSG_TRUNC)
print(result, msg)
---
stdout: 8 bytearray(b'ab')
History
Date User Action Args
2022-04-11 14:58:20adminsetgithub: 69121
2016-09-26 15:31:46christian.heimessetpriority: critical -> high
type: security -> behavior
messages: + msg277429

versions: - Python 3.4
2016-09-26 14:59:57christian.heimessetpriority: normal -> critical

messages: + msg277427
versions: + Python 3.7
2016-09-09 00:17:37christian.heimessetnosy: + christian.heimes
2016-04-27 03:38:19martin.pantersetnosy: + martin.panter
messages: + msg264343
components: + Extension Modules, - Library (Lib)
2016-04-27 01:02:06berker.peksagsetnosy: + berker.peksag
stage: needs patch

versions: + Python 3.5, Python 3.6
2015-08-26 20:24:33Andrey Waginsettype: security
2015-08-26 20:22:45Andrey Waginsetmessages: + msg249214
versions: + Python 3.4
2015-08-25 15:22:18benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg249127
2015-08-25 13:21:46Andrey Waginsetmessages: + msg249121
2015-08-25 11:28:15Andrey Wagincreate