classification
Title: imaplib causes excessive fragmentation for large documents
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, christian.heimes, effbot, rich, vila
Priority: high Keywords: easy, patch

Created on 2005-12-23 18:11 by effbot, last changed 2008-03-07 10:24 by vila. This issue is now closed.

Messages (6)
msg60857 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2005-12-23 18:11
When fetching large documents via SSL, the imaplib 
attempts to read it all in one chunk, but the SSL 
socket layer only returns ~16k at a time.

The result is that Python will end up allocating, say, 
a 15 megabyte block, shrink it to a few kilobytes, 
occasionally allocate a medium-sized block (to hold 
the list of chunks), and repeat this again and again 
and again.  Not all malloc implementations can reuse 
the (15 megabytes minus a few kilobyte) block when 
allocating the next 15 megabyte block.  In a worst 
case scenario, you'll need some 13 gigabytes of 
virtual memory to read a 15 megabyte message...

A simple solution is to change

    data = self.sslobj.read(size-read)

to

    data = self.sslobj.read(min(size-read, 16384))

For more on this, see this thread:

http://groups.google.com/group/comp.lang.python/browse_
frm/thread/3737500bac287575/d715bf614a86e786

</F>
msg60858 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2005-12-25 10:57
Logged In: YES 
user_id=38376

As noted in that thread, the same problem applies to non-
SSL accesses.  The problematic line is:

data = self._sock.recv(recv_size) 
msg61302 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-20 14:40
I read your posting on the python general list.

Eeeeh! That's awful.
msg61651 - (view) Author: Richard Cooper (rich) Date: 2008-01-24 19:25
I think I was just bitten by the non-SSL version of this bug on Python 2.5.1 (r251:54863) on Mac 
OS 10.5. It manifested itself as a "malloc error: can't allocate region" while downloading a 
message using imaplib.

As suggested by effbot I changed "data = self._sock.recv(recv_size)" to "data = 
self._sock.recv(min(recv_size, 16384))" in both places that line appears in socket.py. Making 
that change fixed the problem for me.

Note that http://bugs.python.org/issue1092502 seems to be a duplicate of this issue. That issue 
contains a slightly different fix proposed by a_lauer, which I've not tried.
msg62795 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-02-23 19:07
Fredrik's suggested fix for SSL IMAP committed as rev. 61006, and to
2.5-maint in rev. 61007.

There still seems to be a problem with the non-SSL version.  I'm looking
into that next.
msg62798 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-02-23 19:32
Andreas Lauer's suggested fix on #1092502 is correct, and fixes the
problem for the non-SSL IMAP class.  Applied to 2.6 trunk in rev. 61008
and to 2.5-maint in rev. 61009.
History
Date User Action Args
2008-03-07 10:24:19vilasetnosy: + vila
2008-02-23 19:32:11akuchlingsetstatus: open -> closed
resolution: fixed
messages: + msg62798
2008-02-23 19:07:45akuchlingsetmessages: + msg62795
2008-02-23 18:52:09akuchlingsetassignee: akuchling
nosy: + akuchling
2008-01-28 20:57:24akuchlingsetkeywords: + patch
2008-01-24 19:25:38richsetnosy: + rich
messages: + msg61651
2008-01-20 14:41:00christian.heimessetversions: + Python 2.6, Python 2.5, - Python 2.4
nosy: + christian.heimes
messages: + msg61302
priority: normal -> high
keywords: + easy
type: behavior
2005-12-23 18:11:10effbotcreate