classification
Title: socket read() can cause MemoryError in Windows
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, amaury.forgeotdarc, anglocelt, elachuni, marcio, mcicogni, neologix, pitrou, python-dev, r.david.murray, taukki, trogdorsey, vila
Priority: normal Keywords: patch

Created on 2006-03-02 08:27 by taukki, last changed 2011-05-25 05:54 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
imaplib_recv_py3k.diff neologix, 2011-05-22 10:45 patch for py3k review
imaplib_recv_27.diff neologix, 2011-05-22 10:46 patch for 2.7 review
Messages (22)
msg27650 - (view) Author: taukki (taukki) Date: 2006-03-02 08:27
When a big mail (>10MB) is fetched MemoryError may
result (reproduced in Windows XP/2003, Python 2.41):

Traceback (most recent call last):
  File "t5.py", line 9, in ?
    data = f.read(26872159)
  File "c:\python24\lib\socket.py", line 307, in read
    data = self._sock.recv(recv_size)
MemoryError

The problem is not actually in imaplib (though it
requests the whole mail in one read), but probably in
combination of socket core code and the implementation
of read() function (he way it buffers the received data
in a list?). I can easily reproduce the problem e.g
with these snippets:

------------CLIENT----------------->
import socket
import time

s = socket.socket()

s.connect(("127.0.0.1", 8037))
f = s.makefile("r", 200000)
while 1:
  data = f.read(26872159) # Using smaller value here  
helps, 100000 seems to work forever...
  #data = s.recv(26872159) # THIS WORKS OK
  print len(data)

---------SERVER--------------------->
import SocketServer
import time

PORT = 8037
time.sleep(0.2)

s = "X" * 1823
class
TimeRequestHandler(SocketServer.StreamRequestHandler):
    def handle(self):
      for x in range(200000):
        self.wfile.write(s)
        print x


server = SocketServer.TCPServer(("", PORT),
TimeRequestHandler)
print "listening on port", PORT
server.serve_forever()



msg27651 - (view) Author: Mauro Cicognini (mcicogni) Date: 2007-05-21 17:14
Consistently reproducible using Python 2.5.1 on Windows XP.

Another bug on the same subject was closed WONTFIX since it all appears to be Microsoft's fault (although asking a socket directly for more than 10MB of data at once is arguably unreasonable).

Since it appears that it's just Imaplib implementation to be somewhat broken, we could possibly patch *that* to work around this problem.
msg72356 - (view) Author: Iain MacKay (anglocelt) Date: 2008-09-02 17:52
Knowing that the large read provokes the problem enables one to write
this simple workaround by subclassing IMAP4 without patching the library:

maxRead = 1000000
class MySSL (imaplib.IMAP4_SSL):
  def read (self, n):
    #print "..Attempting to read %d bytes" % n
      if n <= maxRead:
        return imaplib.IMAP4_SSL.read (self, n)
      else:
        soFar = 0
        result = ""
        while soFar < n:
          thisFragmentSize = min(maxRead, n-soFar)
          #print "..Reading fragment size %s" % thisFragmentSize
          fragment =\
            imaplib.IMAP4_SSL.read (self, thisFragmentSize)
          result += fragment
          soFar += thisFragmentSize # only a few, so not a tragic o/head
          return result				

and this works just fine.
msg72906 - (view) Author: Anthony Lenton (elachuni) * Date: 2008-09-09 20:36
I confirm that the bug occurs with Python 2.5.1 on Windows XP.  Also,
anglocelt's fix worked fine for me.
msg73071 - (view) Author: Anthony Lenton (elachuni) * Date: 2008-09-12 01:18
It's probably just a typo from copying from an editor, but there is a
bug in the workaround.  It should be:

maxRead = 1000000
class MySSL (imaplib.IMAP4_SSL):
  def read (self, n):
    #print "..Attempting to read %d bytes" % n
      if n <= maxRead:
        return imaplib.IMAP4_SSL.read (self, n)
      else:
        soFar = 0
        result = ""
        while soFar < n:
          thisFragmentSize = min(maxRead, n-soFar)
          #print "..Reading fragment size %s" % thisFragmentSize
          fragment =\
            imaplib.IMAP4_SSL.read (self, thisFragmentSize)
          result += fragment
          soFar += thisFragmentSize # only a few, so not a tragic o/head
        return result

(With one less level of indentation in the last line).
Apart from that, the fix works wonderfully.
msg100011 - (view) Author: Márcio (marcio) Date: 2010-02-24 09:02
I've been also receiving the same error while using "imaplib.IMAP4_SSL" to download large e-mails.

A simple change in the "read" and "readline" functions to use instead a "cStringIO.StringIO" buffer object works for me.
msg100388 - (view) Author: (trogdorsey) Date: 2010-03-04 14:18
I see this issue on a 32-bit Fedora 10 system using Python 2.6.4 as well.
msg102261 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-03 13:39
$ cat /tmp/test.py
import socket

SIZE = 1000000000L

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

try:
    s.recv(SIZE)
finally:
    s.close()

$ python /tmp/test.py
Traceback (most recent call last):
  File "/tmp/test.py", line 8, in <module>
    s.recv(SIZE)
MemoryError

$ strace python /tmp/test.py
[...]
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
mmap2(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x4440c000)                         = 0x8a56000
mmap2(NULL, 1000132608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb79b1000
munmap(0xb79b1000, 323584)              = 0
munmap(0xb7b00000, 724992)              = 0
mprotect(0xb7a00000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[...]

imaplib is simply requesting too much at a time: the memory error is probably raised in Modules/socketmodule.c:sock_recv
        /* Allocate a new string. */
        buf = PyString_FromStringAndSize((char *) 0, recvlen);
        if (buf == NULL)
                return NULL;

Requesting a 10M read is indeed quite questionable, and will fail no matter what system is used (the limit will depend on the heap usage and OS, though).
The fix should be made at imaplib level, rather than requiring users to redefine IMAP4 read method. A patch is attached (imaplib_read.diff).
msg111087 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-21 16:49
I fail to see *why* the patch fixes the issue. You still have to allocate the big string when joining the parts.
msg111089 - (view) Author: Márcio (marcio) Date: 2010-07-21 16:54
Speaking for myself, I'm not using the attached patch, I'm using the simple fix I included in my previous reply which works perfectly to avoid getting a MemoryError exception thrown.
msg111126 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-21 22:03
Which module did you change, and which precise version of python are you using? I wonder if this was fixed with issue2632.
msg111366 - (view) Author: Márcio (marcio) Date: 2010-07-23 17:09
I got that error on Windows 7 Professional 64 bits, using Python 2.6.4 64 bits.

I just changed the "imaplib.IMAP4_SSL" class, by making the "read" and "readline" functions use "cStringIO.StringIO()" instead of an array of strings for "chunks = []".
msg114637 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-22 01:51
At least three solutions to this have been suggested.  The simplest is that provided by Márcio but is this acceptable?
msg136427 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-21 10:17
It's actually an obvious case of heap fragmentation due to long-lived chunks being realloc()ed to a smaller size. Some malloc implementations can choke on this (e.g. OS-X's malloc is known to not shrink blocks when realloc() is called with a smaller size).
The solution is simply to use a StringIO to avoid holding references to those blocks for too long.
Patch attached.
msg136451 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-21 15:31
Patch looks ok. Is 3.x also affected? The I/O stack changed quite a bit in 3.x.
msg136457 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-21 16:47
> Patch looks ok. Is 3.x also affected? The I/O stack changed quite a bit in
> 3.x.

I think it's not affected, but I can't reproduce this behaviour with
glibc/eglibc, so don't just take my word for it.
The reason is that in py3k, imaplib uses a buffered reader to read
from the socket (obtained through sock.makefile()), and IIUC, reading
from a buffered reader n bytes will always return n bytes (unless EOF
is encountered or the underlying object is set to non-blocking),
contrarily to reading directly from a socket. So the resizing is done
from the buffered io module, which only grows the result buffer,
freeing the buffers returned by socket's recv() one at a time.

Note that if I'm right, another way to solve this would be to wrap the
SSL socket with a call to makefile, like what's already done for the
non-SSL case (and in py3k). This would also lead to cleaner and
simpler code, because we wouldn't need to handle partial reads
ourself.
That's what the patch attached does.
Note that if I'm correct, then the current py3k imaplib read() method
could also be simplified:

    def read(self, size):
        """Read 'size' bytes from remote."""
        chunks = []
        read = 0
        while read < size:
            data = self.file.read(min(size-read, 4096))
            if not data:
                break
            read += len(data)
            chunks.append(data)
        return b''.join(chunks)

It would also be a bit faster.
But since I can't reproduce this issue, I'd rather make sure it's
correct before making a such change.
msg136458 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-21 17:16
In the buffered reader case, the result buffer is actually pre-allocated with the total size, making fragmentation even less likely.
msg136510 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-22 10:45
Digging a little deeper, here's the conclusion:
- with py3k, fragmentation is less likely: the buffered reader returned by makefile() ensures that we can allocate only one result buffer for the total number of bytes read() (thanks to socket's readinto()). It's also faster. Finally, since buffered read() guarantees to return exactly the number of bytes asked, there's no need to call it repeatedly with min(size-read, 4096): it's slower, and actually makes fragmentation more likely.
So I've attached a patch removing the looping from py3k IMAP4 read() method, which leads to simpler, faster and fragmentation-less code.
- for 2.7, it's also better to wrap the SSL socket with makefile(): the leads to simpler code (no need to loop), and also I noticed that socket's _fileobject's (returned by makefile()) read() and readline() methods already use a StringIO to avoid fragmentation.
So I've attached a second patch wrapping 2.7 IMAP4_SSL socket with makefile.

While I can't reproduce this issue on my system, I've analyzed malloc/realloc/free calls using ltrace, and I'm pretty confident this should solve fragmentation issues for both 2.7 and py3k.

By the way, nice work with the I/O stack in py3k, the new layer allows much more efficient code (reduced allocations and copies)!
msg136541 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-22 15:39
Both patches look ok to me.
msg136793 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-24 21:12
New changeset 0dee36595699 by Charles-François Natali in branch '2.7':
Issue #1441530: In imaplib, use makefile() to wrap the SSL socket to avoid
http://hg.python.org/cpython/rev/0dee36595699
msg136797 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-24 21:48
New changeset 14bb95a8d7ee by Charles-François Natali in branch 'default':
Issue #1441530: In imaplib, read the data in one chunk to speed up large
http://hg.python.org/cpython/rev/14bb95a8d7ee
msg136827 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-25 05:54
I've committed the patch to 2.7, and also to default (and only to default since for py3k it's more of an optimization than a bug fix).
Closing.
History
Date User Action Args
2011-05-25 05:54:13neologixsetstatus: open -> closed
resolution: fixed
messages: + msg136827

stage: patch review -> resolved
2011-05-24 21:48:46python-devsetmessages: + msg136797
2011-05-24 21:12:19python-devsetnosy: + python-dev
messages: + msg136793
2011-05-22 15:39:12pitrousetmessages: + msg136541
versions: + Python 3.3
2011-05-22 10:46:55neologixsetfiles: - imaplib_ssl_makefile.diff
2011-05-22 10:46:34neologixsetfiles: - imaplib_read.diff
2011-05-22 10:46:14neologixsetfiles: + imaplib_recv_27.diff
2011-05-22 10:45:41neologixsetfiles: + imaplib_recv_py3k.diff

messages: + msg136510
2011-05-21 17:16:33neologixsetmessages: + msg136458
2011-05-21 16:47:57neologixsetfiles: + imaplib_ssl_makefile.diff

messages: + msg136457
2011-05-21 15:31:09pitrousetnosy: + pitrou
messages: + msg136451
2011-05-21 10:17:38neologixsetfiles: - imaplib_read.diff
2011-05-21 10:17:18neologixsetfiles: + imaplib_read.diff

messages: + msg136427
2010-08-22 01:51:34BreamoreBoysetversions: + Python 3.1, - Python 2.6
nosy: + BreamoreBoy

messages: + msg114637

type: performance -> behavior
stage: patch review
2010-07-23 17:09:36marciosetmessages: + msg111366
2010-07-21 22:03:05amaury.forgeotdarcsetmessages: + msg111126
2010-07-21 16:54:43marciosetmessages: + msg111089
2010-07-21 16:49:05amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg111087
2010-04-03 13:39:15neologixsetfiles: + imaplib_read.diff

nosy: + neologix
messages: + msg102261

keywords: + patch
2010-03-04 14:18:06trogdorseysetnosy: + trogdorsey

messages: + msg100388
versions: + Python 2.6
2010-02-24 17:37:00pitrousetnosy: + r.david.murray

type: performance
versions: + Python 2.7, Python 3.2, - Python 2.5
2010-02-24 09:02:20marciosetnosy: + marcio
messages: + msg100011
2008-09-12 01:18:04elachunisetmessages: + msg73071
2008-09-09 20:36:33elachunisetnosy: + elachuni
messages: + msg72906
2008-09-02 17:52:37angloceltsetnosy: + anglocelt
messages: + msg72356
versions: + Python 2.5, - Python 2.4
2008-01-05 13:16:20vilasetnosy: + vila
2006-03-02 08:27:59taukkicreate