classification
Title: for line in socket.makefile() speed degradation
Type: performance Stage:
Components: IO, Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, giampaolo.rodola, sbt, serhiy.storchaka, socketpair, vstinner
Priority: normal Keywords:

Created on 2013-06-29 21:25 by socketpair, last changed 2016-09-09 10:53 by socketpair.

Files
File name Uploaded Description Edit
qwe.py socketpair, 2013-06-29 22:09
perf.txt christian.heimes, 2013-07-04 10:37
Messages (21)
msg192044 - (view) Author: Марк Коренберг (socketpair) * Date: 2013-06-29 21:25
Results or running attached program:
$ python2.7 qwe.py 
TCP  mode, makefile method. 198807.2 lines per second (189.6 MB/s). Delay is 5.03 seconds
TCP  mode,   fdopen method. 1041666.7 lines per second (993.4 MB/s). Delay is 0.96 seconds
UNIX mode, makefile method. 2040816.3 lines per second (1946.3 MB/s). Delay is 0.49 seconds
UNIX mode,   fdopen method. 1923076.9 lines per second (1834.0 MB/s). Delay is 0.52 seconds

$ python3.2 qwe.py 
TCP  mode, makefile method. 275482.1 lines per second (262.7 MB/s). Delay is 3.63 seconds
TCP  mode,   fdopen method. 909090.9 lines per second (867.0 MB/s). Delay is 1.10 seconds
UNIX mode, makefile method. 323624.6 lines per second (308.6 MB/s). Delay is 3.09 seconds
UNIX mode,   fdopen method. 1694915.3 lines per second (1616.4 MB/s). Delay is 0.59 seconds

----------------------------------------------
1. in every case, socket.makefile() is MUCH slower than os.fdopen() when used as "for line in fileobject"
2. compare speeds between python 2.7 and python 3.2 in same operation. Especially, socketpair+makefile
3. Why not to return os.fdopen() for sockets when socket.makefile() called on unix systems?
msg192046 - (view) Author: Марк Коренберг (socketpair) * Date: 2013-06-29 21:36
Yes, results are repeatable, and for python 2.7 I have roughly same timings for UNIX socket.

Also, I have straced all variants and see that in all 4 cases (and for both python versions) IO is done using 8192 blocks in size, so buffering is not cause of problems.

I have linux 3.5.0, Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
msg192047 - (view) Author: Марк Коренберг (socketpair) * Date: 2013-06-29 21:43
Well, python 3.3 is slightly faster:

$ python3.3 qwe.py 
TCP  mode, makefile method. 380228.1 lines per second (362.6 MB/s). Delay is 2.63 seconds
TCP  mode,   fdopen method. 877193.0 lines per second (836.6 MB/s). Delay is 1.14 seconds
UNIX mode, makefile method. 469483.6 lines per second (447.7 MB/s). Delay is 2.13 seconds
UNIX mode,   fdopen method. 1562500.0 lines per second (1490.1 MB/s). Delay is 0.64 seconds

But problem still exists
msg192050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-29 21:53
Try to wrap socket.makefile() with io.BufferedReader().
msg192051 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-06-29 21:59
I think in Python 3 makefile() returns a TextIOWrapper object by default. To force the use of binary you need to specfiy the mode:

    fileobj = ss.makefile(mode='rb')
msg192052 - (view) Author: Марк Коренберг (socketpair) * Date: 2013-06-29 22:09
Eliminate unicode conversion for python3, but results still the same

$ python2.7 qwe.py 
TCP  mode, makefile method. 211416.5 lines per second (201.6 MB/s). Delay is 4.73 seconds
TCP  mode,   fdopen method. 1041666.7 lines per second (993.4 MB/s). Delay is 0.96 seconds
UNIX mode, makefile method. 2040816.3 lines per second (1946.3 MB/s). Delay is 0.49 seconds
UNIX mode,   fdopen method. 1886792.5 lines per second (1799.4 MB/s). Delay is 0.53 seconds

$ python3.2 qwe.py 
TCP  mode, makefile method. 487804.9 lines per second (465.2 MB/s). Delay is 2.05 seconds
TCP  mode,   fdopen method. 900900.9 lines per second (859.2 MB/s). Delay is 1.11 seconds
UNIX mode, makefile method. 625000.0 lines per second (596.0 MB/s). Delay is 1.60 seconds
UNIX mode,   fdopen method. 1492537.3 lines per second (1423.4 MB/s). Delay is 0.67 seconds

$ python3.3 qwe.py 
TCP  mode, makefile method. 512820.5 lines per second (489.1 MB/s). Delay is 1.95 seconds
TCP  mode,   fdopen method. 884955.8 lines per second (844.0 MB/s). Delay is 1.13 seconds
UNIX mode, makefile method. 680272.1 lines per second (648.8 MB/s). Delay is 1.47 seconds
UNIX mode,   fdopen method. 1587301.6 lines per second (1513.8 MB/s). Delay is 0.63 seconds

Also, io.BufferedReader() is not my case. I understand that intermediate buffer will increase performance. Problem in implementation of socket.makefile(). That is I want to be fixed.
msg192054 - (view) Author: Марк Коренберг (socketpair) * Date: 2013-06-29 22:14
Can anyone test in python 3.4 ?
msg192063 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-30 08:09
3.4 have same performance as 3.3.
msg192283 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-04 10:37
I first suspected that file based IO involves too many syscalls and context switches. However ``strace -c`` showed only minor differences in the amount of syscalls. The difference between read() and recvfrom() should not make any difference.

perf revealed that the makefile() test uses much more CPU cycles. A quick check with python3.3 -m profile points to Python method call overhead. Five methods are called more than 122,000 times each and a couple of attributes are checked, too. Perhaps we can optimize away _checkReadable() and _checkClosed()?

I think I know what's going on here. For socket IO readline() uses a readahead buffer size of 1. For each char a cascade of Python functions is called. os.fdopen() uses a default buffering of 4096 (or more) so it's not affected.
msg192284 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-04 10:39
Summary:

makefile: 611990 primitive calls
fdopen:     1629 primitive calls
msg192296 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-04 15:20
> I think I know what's going on here. For socket IO readline() uses a 
> readahead buffer size of 1.

Why is that?  I think that makefile(mode='rb') and fdopen() both create BufferedReader objects with the same buffer size.

It looks to me like there are the same number of reads for both cases (about 120,000 ~ data_size/buffer_size).  But with SocketIO, there are 5 function calls for each read into the buffer.
msg192297 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-04 16:06
Using

        while True:
            if not fileobj.read(8192):
                break

instead of

        for line in fileobj:
            pass

results in higher throughput, but a similar slowdown with makefile().  So this is not a problem specific to readline().
msg192298 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-04 16:06
Err, you are right. It's buffered. I misread one function call and thought it's not buffered at all.

>>> COUNT = 1000000
>>> LINELEN = 1000
>>> int(LINELEN * COUNT / 8192)
122070

The number looks familiar, doesn't it? :)

I still stand by my analysis that the slow down is caused by Python's method call overhead and the extra function call to _checkReadable(),  readable and _checkClosed(). A C implementation of the raw SocketIO class would reduce the overhead.

The implementation of SocketIO assumes that performance is dominated by IO (e.g. slow internet connection). A local socket pair is faster than 10GBit/sec ethernet or 8 GBit/sec fibre channel.
msg192301 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-04 18:30
The only real reason for implementing SocketIO in pure Python is because read() and write() do not work on Windows with sockets.  (I think there are also a few complications involving SSL sockets and the close() method.)

On Windows I have implemented a file object type in C which works with pipe handles.  I hope to use it in multiprocessing at some point.  It would not be too difficult to support sockets as well and use that instead of SocketIO.  For Unix, FileIO can be used instead of SocketIO.
msg192302 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-04 18:54
There is little bit more to it. A comment in the class' body describes the reason for SockteIO:

    # One might wonder why not let FileIO do the job instead.  There are two
    # main reasons why FileIO is not adapted:
    # - it wouldn't work under Windows (where you can't used read() and
    #   write() on a socket handle)
    # - it wouldn't work with socket timeouts (FileIO would ignore the
    #   timeout and consider the socket non-blocking)
msg192303 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-04 19:02
Ah.  I had not thought of socket timeouts.
msg192309 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-07-04 19:39
On Python 2, socket.makefile("rb") duplicates the file descriptor and creates a new file object: fileobj.read() calls the C function read().

On Python 3, socket.makefile("rb") creates a SocketIO wrapped in BufferedReader: fileobj.read() calls the C function recv() (recvfrom() syscall).

For a UNIX socket, read() is maybe more efficient than recv().
msg192313 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-07-04 19:57
My results of the benchmark on Linux 3.9, first with a loop calling fileobj.read(8192) ("read"), then with the "for line in fileobj: pass" ("readline").

$ python2.7 bench_socket_read.py
TCP  mode, makefile method. 1775085.8 lines per second (1692.9 MB/s). Delay is 0.56 seconds
TCP  mode,   fdopen method. 2767162.2 lines per second (2639.0 MB/s). Delay is 0.36 seconds
UNIX mode, makefile method. 5238015.2 lines per second (4995.4 MB/s). Delay is 0.19 seconds
UNIX mode,   fdopen method. 6218906.2 lines per second (5930.8 MB/s). Delay is 0.16 seconds

$ python3.4 bench_socket_read.py
TCP  mode, makefile method. 1403075.1 lines per second (1338.1 MB/s). Delay is 0.71 seconds
TCP  mode,   fdopen method. 2324254.2 lines per second (2216.6 MB/s). Delay is 0.43 seconds
UNIX mode, makefile method. 1770625.0 lines per second (1688.6 MB/s). Delay is 0.56 seconds
UNIX mode,   fdopen method. 4953624.6 lines per second (4724.1 MB/s). Delay is 0.20 seconds

$ python2.7 bench_socket_readline.py
TCP  mode, makefile method. 246268.2 lines per second (234.9 MB/s). Delay is 4.06 seconds
TCP  mode,   fdopen method. 1980998.2 lines per second (1889.2 MB/s). Delay is 0.50 seconds
UNIX mode, makefile method. 3378778.4 lines per second (3222.3 MB/s). Delay is 0.30 seconds
UNIX mode,   fdopen method. 3271267.0 lines per second (3119.7 MB/s). Delay is 0.31 seconds

$ python3.4 bench_socket_readline.py
TCP  mode, makefile method. 826646.7 lines per second (788.4 MB/s). Delay is 1.21 seconds
TCP  mode,   fdopen method. 1664917.8 lines per second (1587.8 MB/s). Delay is 0.60 seconds
UNIX mode, makefile method. 942409.0 lines per second (898.8 MB/s). Delay is 1.06 seconds
UNIX mode,   fdopen method. 2619285.2 lines per second (2497.9 MB/s). Delay is 0.38 seconds

The results of the "read" benchmark for "UNIX mode, makefile method" case are very different on Python 2.7 (call read()) and Python 3.4 (call recv()): 4995.4 MB/s vs 1688.6 MB/s (2.95x slower).

Readline on TCP is much faster on Python 3.4 than 2.7: 788.4 MB/s vs 234.9 MB/s (3.4x faster). I don't know why, but I don't want to investigate (it is not a regression) :-)

The slowdown on other readline tests are less interesting: Python 3.4 is 16% slower for "TCP  mode, fdopen method" and 20% slower for "UNIX mode, fdopen method".

It would be interesting to have the same benchmark implemented in C to compare "raw" performances, so it would be possible to measure the overhead of Python.
msg192316 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-04 20:45
I find that by adding the lines

            fileobj.raw.readinto = ss.recv_into
            fileobj.raw.read = ss.recv

the speed with makefile() is about 30% slower than with fdopen().
msg275196 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-08 22:44
Is this still a problem in 3.5 and newer?
msg275302 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-09-09 10:53
mmarkk@mmwork:~$ python2.7 --version
Python 2.7.12

mmarkk@mmwork:~$ python2.7 ./qwe.py
TCP  mode, makefile method. 292731.3 lines per second (279.2 MB/s). Delay is 3.42 seconds
TCP  mode,   fdopen method. 2410875.9 lines per second (2299.2 MB/s). Delay is 0.41 seconds
UNIX mode, makefile method. 3247881.6 lines per second (3097.4 MB/s). Delay is 0.31 seconds
UNIX mode,   fdopen method. 3406528.3 lines per second (3248.7 MB/s). Delay is 0.29 seconds

mmarkk@mmwork:~$ python3.5 --version
Python 3.5.2

mmarkk@mmwork:~$ python3.5 ./qwe.py 
TCP  mode, makefile method. 822754.8 lines per second (784.6 MB/s). Delay is 1.22 seconds
TCP  mode,   fdopen method. 1916700.2 lines per second (1827.9 MB/s). Delay is 0.52 seconds
UNIX mode, makefile method. 999149.7 lines per second (952.9 MB/s). Delay is 1.00 seconds
UNIX mode,   fdopen method. 2639650.7 lines per second (2517.4 MB/s). Delay is 0.38 seconds
History
Date User Action Args
2016-09-09 10:53:29socketpairsetstatus: pending -> open

messages: + msg275302
2016-09-08 22:44:21christian.heimessetstatus: open -> pending

messages: + msg275196
versions: + Python 3.6, Python 3.7, - Python 3.2, Python 3.3, Python 3.4, Python 3.5
2013-07-06 18:44:12giampaolo.rodolasetnosy: + giampaolo.rodola
2013-07-04 20:45:27sbtsetmessages: + msg192316
2013-07-04 19:57:05vstinnersetmessages: + msg192313
2013-07-04 19:39:50vstinnersetnosy: + vstinner
messages: + msg192309
2013-07-04 19:02:07sbtsetmessages: + msg192303
2013-07-04 18:54:03christian.heimessetmessages: + msg192302
2013-07-04 18:30:10sbtsetmessages: + msg192301
2013-07-04 16:06:42christian.heimessetmessages: + msg192298
2013-07-04 16:06:32sbtsetmessages: + msg192297
2013-07-04 15:20:52sbtsetmessages: + msg192296
2013-07-04 10:39:07christian.heimessetmessages: + msg192284
2013-07-04 10:37:05christian.heimessetfiles: + perf.txt

messages: + msg192283
2013-07-04 09:21:07christian.heimessetnosy: + christian.heimes
2013-06-30 08:09:42serhiy.storchakasetmessages: + msg192063
2013-06-29 22:14:34socketpairsetmessages: + msg192054
2013-06-29 22:09:36socketpairsetfiles: - qwe.py
2013-06-29 22:09:07socketpairsetfiles: + qwe.py

messages: + msg192052
2013-06-29 21:59:24sbtsetnosy: + sbt
messages: + msg192051
2013-06-29 21:53:37serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg192050
2013-06-29 21:43:26socketpairsetmessages: + msg192047
2013-06-29 21:36:16socketpairsetmessages: + msg192046
2013-06-29 21:25:58socketpaircreate