msg27650 - (view) |
Author: taukki (taukki) |
Date: 2006-03-02 08:27 |
When a big mail (>10MB) is fetched MemoryError may
result (reproduced in Windows XP/2003, Python 2.41):
Traceback (most recent call last):
File "t5.py", line 9, in ?
data = f.read(26872159)
File "c:\python24\lib\socket.py", line 307, in read
data = self._sock.recv(recv_size)
MemoryError
The problem is not actually in imaplib (though it
requests the whole mail in one read), but probably in
combination of socket core code and the implementation
of read() function (he way it buffers the received data
in a list?). I can easily reproduce the problem e.g
with these snippets:
------------CLIENT----------------->
import socket
import time
s = socket.socket()
s.connect(("127.0.0.1", 8037))
f = s.makefile("r", 200000)
while 1:
data = f.read(26872159) # Using smaller value here
helps, 100000 seems to work forever...
#data = s.recv(26872159) # THIS WORKS OK
print len(data)
---------SERVER--------------------->
import SocketServer
import time
PORT = 8037
time.sleep(0.2)
s = "X" * 1823
class
TimeRequestHandler(SocketServer.StreamRequestHandler):
def handle(self):
for x in range(200000):
self.wfile.write(s)
print x
server = SocketServer.TCPServer(("", PORT),
TimeRequestHandler)
print "listening on port", PORT
server.serve_forever()
|
msg27651 - (view) |
Author: Mauro Cicognini (mcicogni) |
Date: 2007-05-21 17:14 |
Consistently reproducible using Python 2.5.1 on Windows XP.
Another bug on the same subject was closed WONTFIX since it all appears to be Microsoft's fault (although asking a socket directly for more than 10MB of data at once is arguably unreasonable).
Since it appears that it's just Imaplib implementation to be somewhat broken, we could possibly patch *that* to work around this problem.
|
msg72356 - (view) |
Author: Iain MacKay (anglocelt) |
Date: 2008-09-02 17:52 |
Knowing that the large read provokes the problem enables one to write
this simple workaround by subclassing IMAP4 without patching the library:
maxRead = 1000000
class MySSL (imaplib.IMAP4_SSL):
def read (self, n):
#print "..Attempting to read %d bytes" % n
if n <= maxRead:
return imaplib.IMAP4_SSL.read (self, n)
else:
soFar = 0
result = ""
while soFar < n:
thisFragmentSize = min(maxRead, n-soFar)
#print "..Reading fragment size %s" % thisFragmentSize
fragment =\
imaplib.IMAP4_SSL.read (self, thisFragmentSize)
result += fragment
soFar += thisFragmentSize # only a few, so not a tragic o/head
return result
and this works just fine.
|
msg72906 - (view) |
Author: Anthony Lenton (elachuni) * |
Date: 2008-09-09 20:36 |
I confirm that the bug occurs with Python 2.5.1 on Windows XP. Also,
anglocelt's fix worked fine for me.
|
msg73071 - (view) |
Author: Anthony Lenton (elachuni) * |
Date: 2008-09-12 01:18 |
It's probably just a typo from copying from an editor, but there is a
bug in the workaround. It should be:
maxRead = 1000000
class MySSL (imaplib.IMAP4_SSL):
def read (self, n):
#print "..Attempting to read %d bytes" % n
if n <= maxRead:
return imaplib.IMAP4_SSL.read (self, n)
else:
soFar = 0
result = ""
while soFar < n:
thisFragmentSize = min(maxRead, n-soFar)
#print "..Reading fragment size %s" % thisFragmentSize
fragment =\
imaplib.IMAP4_SSL.read (self, thisFragmentSize)
result += fragment
soFar += thisFragmentSize # only a few, so not a tragic o/head
return result
(With one less level of indentation in the last line).
Apart from that, the fix works wonderfully.
|
msg100011 - (view) |
Author: Márcio (marcio) |
Date: 2010-02-24 09:02 |
I've been also receiving the same error while using "imaplib.IMAP4_SSL" to download large e-mails.
A simple change in the "read" and "readline" functions to use instead a "cStringIO.StringIO" buffer object works for me.
|
msg100388 - (view) |
Author: (trogdorsey) |
Date: 2010-03-04 14:18 |
I see this issue on a 32-bit Fedora 10 system using Python 2.6.4 as well.
|
msg102261 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2010-04-03 13:39 |
$ cat /tmp/test.py
import socket
SIZE = 1000000000L
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
s.recv(SIZE)
finally:
s.close()
$ python /tmp/test.py
Traceback (most recent call last):
File "/tmp/test.py", line 8, in <module>
s.recv(SIZE)
MemoryError
$ strace python /tmp/test.py
[...]
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
mmap2(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x4440c000) = 0x8a56000
mmap2(NULL, 1000132608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xb79b1000
munmap(0xb79b1000, 323584) = 0
munmap(0xb7b00000, 724992) = 0
mprotect(0xb7a00000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 1000001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[...]
imaplib is simply requesting too much at a time: the memory error is probably raised in Modules/socketmodule.c:sock_recv
/* Allocate a new string. */
buf = PyString_FromStringAndSize((char *) 0, recvlen);
if (buf == NULL)
return NULL;
Requesting a 10M read is indeed quite questionable, and will fail no matter what system is used (the limit will depend on the heap usage and OS, though).
The fix should be made at imaplib level, rather than requiring users to redefine IMAP4 read method. A patch is attached (imaplib_read.diff).
|
msg111087 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-07-21 16:49 |
I fail to see *why* the patch fixes the issue. You still have to allocate the big string when joining the parts.
|
msg111089 - (view) |
Author: Márcio (marcio) |
Date: 2010-07-21 16:54 |
Speaking for myself, I'm not using the attached patch, I'm using the simple fix I included in my previous reply which works perfectly to avoid getting a MemoryError exception thrown.
|
msg111126 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2010-07-21 22:03 |
Which module did you change, and which precise version of python are you using? I wonder if this was fixed with issue2632.
|
msg111366 - (view) |
Author: Márcio (marcio) |
Date: 2010-07-23 17:09 |
I got that error on Windows 7 Professional 64 bits, using Python 2.6.4 64 bits.
I just changed the "imaplib.IMAP4_SSL" class, by making the "read" and "readline" functions use "cStringIO.StringIO()" instead of an array of strings for "chunks = []".
|
msg114637 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-08-22 01:51 |
At least three solutions to this have been suggested. The simplest is that provided by Márcio but is this acceptable?
|
msg136427 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-05-21 10:17 |
It's actually an obvious case of heap fragmentation due to long-lived chunks being realloc()ed to a smaller size. Some malloc implementations can choke on this (e.g. OS-X's malloc is known to not shrink blocks when realloc() is called with a smaller size).
The solution is simply to use a StringIO to avoid holding references to those blocks for too long.
Patch attached.
|
msg136451 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-05-21 15:31 |
Patch looks ok. Is 3.x also affected? The I/O stack changed quite a bit in 3.x.
|
msg136457 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-05-21 16:47 |
> Patch looks ok. Is 3.x also affected? The I/O stack changed quite a bit in
> 3.x.
I think it's not affected, but I can't reproduce this behaviour with
glibc/eglibc, so don't just take my word for it.
The reason is that in py3k, imaplib uses a buffered reader to read
from the socket (obtained through sock.makefile()), and IIUC, reading
from a buffered reader n bytes will always return n bytes (unless EOF
is encountered or the underlying object is set to non-blocking),
contrarily to reading directly from a socket. So the resizing is done
from the buffered io module, which only grows the result buffer,
freeing the buffers returned by socket's recv() one at a time.
Note that if I'm right, another way to solve this would be to wrap the
SSL socket with a call to makefile, like what's already done for the
non-SSL case (and in py3k). This would also lead to cleaner and
simpler code, because we wouldn't need to handle partial reads
ourself.
That's what the patch attached does.
Note that if I'm correct, then the current py3k imaplib read() method
could also be simplified:
def read(self, size):
"""Read 'size' bytes from remote."""
chunks = []
read = 0
while read < size:
data = self.file.read(min(size-read, 4096))
if not data:
break
read += len(data)
chunks.append(data)
return b''.join(chunks)
It would also be a bit faster.
But since I can't reproduce this issue, I'd rather make sure it's
correct before making a such change.
|
msg136458 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-05-21 17:16 |
In the buffered reader case, the result buffer is actually pre-allocated with the total size, making fragmentation even less likely.
|
msg136510 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-05-22 10:45 |
Digging a little deeper, here's the conclusion:
- with py3k, fragmentation is less likely: the buffered reader returned by makefile() ensures that we can allocate only one result buffer for the total number of bytes read() (thanks to socket's readinto()). It's also faster. Finally, since buffered read() guarantees to return exactly the number of bytes asked, there's no need to call it repeatedly with min(size-read, 4096): it's slower, and actually makes fragmentation more likely.
So I've attached a patch removing the looping from py3k IMAP4 read() method, which leads to simpler, faster and fragmentation-less code.
- for 2.7, it's also better to wrap the SSL socket with makefile(): the leads to simpler code (no need to loop), and also I noticed that socket's _fileobject's (returned by makefile()) read() and readline() methods already use a StringIO to avoid fragmentation.
So I've attached a second patch wrapping 2.7 IMAP4_SSL socket with makefile.
While I can't reproduce this issue on my system, I've analyzed malloc/realloc/free calls using ltrace, and I'm pretty confident this should solve fragmentation issues for both 2.7 and py3k.
By the way, nice work with the I/O stack in py3k, the new layer allows much more efficient code (reduced allocations and copies)!
|
msg136541 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-05-22 15:39 |
Both patches look ok to me.
|
msg136793 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-24 21:12 |
New changeset 0dee36595699 by Charles-François Natali in branch '2.7':
Issue #1441530: In imaplib, use makefile() to wrap the SSL socket to avoid
http://hg.python.org/cpython/rev/0dee36595699
|
msg136797 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-05-24 21:48 |
New changeset 14bb95a8d7ee by Charles-François Natali in branch 'default':
Issue #1441530: In imaplib, read the data in one chunk to speed up large
http://hg.python.org/cpython/rev/14bb95a8d7ee
|
msg136827 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-05-25 05:54 |
I've committed the patch to 2.7, and also to default (and only to default since for py3k it's more of an optimization than a bug fix).
Closing.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:15 | admin | set | github: 42966 |
2011-05-25 05:54:13 | neologix | set | status: open -> closed resolution: fixed messages:
+ msg136827
stage: patch review -> resolved |
2011-05-24 21:48:46 | python-dev | set | messages:
+ msg136797 |
2011-05-24 21:12:19 | python-dev | set | nosy:
+ python-dev messages:
+ msg136793
|
2011-05-22 15:39:12 | pitrou | set | messages:
+ msg136541 versions:
+ Python 3.3 |
2011-05-22 10:46:55 | neologix | set | files:
- imaplib_ssl_makefile.diff |
2011-05-22 10:46:34 | neologix | set | files:
- imaplib_read.diff |
2011-05-22 10:46:14 | neologix | set | files:
+ imaplib_recv_27.diff |
2011-05-22 10:45:41 | neologix | set | files:
+ imaplib_recv_py3k.diff
messages:
+ msg136510 |
2011-05-21 17:16:33 | neologix | set | messages:
+ msg136458 |
2011-05-21 16:47:57 | neologix | set | files:
+ imaplib_ssl_makefile.diff
messages:
+ msg136457 |
2011-05-21 15:31:09 | pitrou | set | nosy:
+ pitrou messages:
+ msg136451
|
2011-05-21 10:17:38 | neologix | set | files:
- imaplib_read.diff |
2011-05-21 10:17:18 | neologix | set | files:
+ imaplib_read.diff
messages:
+ msg136427 |
2010-08-22 01:51:34 | BreamoreBoy | set | versions:
+ Python 3.1, - Python 2.6 nosy:
+ BreamoreBoy
messages:
+ msg114637
type: performance -> behavior stage: patch review |
2010-07-23 17:09:36 | marcio | set | messages:
+ msg111366 |
2010-07-21 22:03:05 | amaury.forgeotdarc | set | messages:
+ msg111126 |
2010-07-21 16:54:43 | marcio | set | messages:
+ msg111089 |
2010-07-21 16:49:05 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg111087
|
2010-04-03 13:39:15 | neologix | set | files:
+ imaplib_read.diff
nosy:
+ neologix messages:
+ msg102261
keywords:
+ patch |
2010-03-04 14:18:06 | trogdorsey | set | nosy:
+ trogdorsey
messages:
+ msg100388 versions:
+ Python 2.6 |
2010-02-24 17:37:00 | pitrou | set | nosy:
+ r.david.murray
type: performance versions:
+ Python 2.7, Python 3.2, - Python 2.5 |
2010-02-24 09:02:20 | marcio | set | nosy:
+ marcio messages:
+ msg100011
|
2008-09-12 01:18:04 | elachuni | set | messages:
+ msg73071 |
2008-09-09 20:36:33 | elachuni | set | nosy:
+ elachuni messages:
+ msg72906 |
2008-09-02 17:52:37 | anglocelt | set | nosy:
+ anglocelt messages:
+ msg72356 versions:
+ Python 2.5, - Python 2.4 |
2008-01-05 13:16:20 | vila | set | nosy:
+ vila |
2006-03-02 08:27:59 | taukki | create | |