classification
Title: Maildir iterator leaks file descriptors by default
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gruszczy, moyix, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2011-04-04 23:11 by moyix, last changed 2013-03-13 21:20 by ezio.melotti.

Files
File name Uploaded Description Edit
11767.patch r.david.murray, 2011-04-07 00:47 Reattaching Filip's patch after accidental delete review
11767_2.patch gruszczy, 2011-04-07 20:03 review
11767_3.patch gruszczy, 2011-04-07 21:14 review
mailbox_fd_loss.patch r.david.murray, 2011-06-18 02:39
Messages (15)
msg132988 - (view) Author: Brendan Dolan-Gavitt (moyix) Date: 2011-04-04 23:11
The default constructor for Maildir is rfc822.Message. This means that when iterating over a Maildir mailbox instantiated with default settings, the Mailbox class will return self._factory(self.get_file(key)), leaking the file descriptor returned by get_file(). Thus, iterating over any reasonably sized maildir mailbox will cause file descriptors to be leaked, and if a non-refcounting GC is used (as in PyPy), it is very likely that the process will run out of file descriptors.

I see that the default has changed to None, for Py3k, but this still means that using a message factory unavoidably leaks file descriptors. Ideally, I'd like the Mailbox class to close the file descriptor passed into the factory; this would mean that file descriptors are never leaked during iteration.
msg133159 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-06 19:52
Shouldn't this be your responsibility to close this descriptor in the object used as self._factory? When self._factory is called it should read from the file like object and then simply close it, just as get_message (when self._factory is not provided) does.
msg133160 - (view) Author: Brendan Dolan-Gavitt (moyix) Date: 2011-04-06 19:59
The _factory object didn't open the file, so why should it close it?

It's also worth noting that things one would naturally consider using as factories (like email.message_from_file) don't close the fd when they're done. I will grant that you could easily create a wrapper factory that does close the descriptor in this case, but that's not the obvious thing to do.
msg133162 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-06 20:18
It is not clear from the docs what the expected behavior of factory is.  It is certainly the case that the custom classes used by mailbox by default do not close the file they are passed.  So either those classes should close the input file and factory should be so documented, or all the methods that create a message object should close the input file when they create one.  Since the existing get_message methods take care to close any opened input files, I favor the latter (ie: the OP's suggestion).

Anyone care to propose a patch?
msg133163 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-06 20:31
How about this? Though I have no idea, how a test could be created for this. If someone could advise me on that, I'll be happy to write tests/work further on the patch.
msg133186 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 00:47
get_file's promise is that what is returned is a file like object, so it not having a close() method would be an error.  So I don't think you need the try/except.  What I would suggest is to use the 'closing' context manager around the return statement.

As for tests, you could create a subclass with a custom get_file method that returns a mock object you can test to make sure the close method gets called, since Mailbox is the only place __getitem__ is defined.

Gah, I hit the wrong key an deleted your patch.  Reattaching.
msg133250 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-07 20:03
Here is a new patch, that uses with in __getitem__. I wonder, if we shouldn't check for an AttributeError in case object returned by get_file doesn't have __exit__ method, but does have close and use close instead of with. But it's for you to decide, as I am very fresh to with statement.
msg133256 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 20:49
Ah, that's exactly why I suggested using the 'closing' context manager from contextlib.  That context manager returns the object passed to it, and then when its __exit__ method is called, calls the close method of that object that was passed to it.
msg133260 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-04-07 21:14
I am sorry. This is the first time I see contextlib and haven't understood, that I should use a function from it. Here is a version with mock object having close method and __getitem__ using contextlib.closing.
msg133287 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-08 02:12
I shouldn't have assumed that you knew about contextlib, and should have mentioned it by name the first time.

The patch looks good to me.  Not sure it is necessary to loop through ten fake mailboxes, but it doesn't hurt, either.  (Other potential reviewers note: 'looks' is the operative word, I haven't tested it myself yet.)
msg138525 - (view) Author: Roundup Robot (python-dev) Date: 2011-06-17 17:10
New changeset fea1920ae75f by R David Murray in branch '3.2':
#11767: use context manager to close file in __getitem__ to prevent FD leak
http://hg.python.org/cpython/rev/fea1920ae75f

New changeset 1d7a91358517 by R David Murray in branch 'default':
merge #11767: use context manager to close file in __getitem__ to prevent FD leak
http://hg.python.org/cpython/rev/1d7a91358517
msg138526 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-17 17:21
The patch doesn't work on 2.7.  The failures are related to #11700, although that would seem to indicate that something *is* trying to close the file.  I have no idea what.
msg138534 - (view) Author: Filip Gruszczyński (gruszczy) Date: 2011-06-17 17:44
Should I try to do something about this right now or should I wait until #1170 is finished&closed and only then try to fix this issue too?
msg138537 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-17 18:00
Well, if you want to you could investigate further, and try the patch from #11700 and see what is left to do here after it has been applied.  I'll try to get 11700 in soon, though.
msg138566 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-18 02:39
Well, it turns out I was totally wrong about the 11700 dependency.  I misread the errors that were produced by the test suite.  Even after fixing 11700 they are still there: the tests are reading from the closed files.  So something changed in the logic between 2.7 and 3.x.  It is possible that this is related to the addition of context manager support, and is almost certainly related to the general cleanup of unclosed files that was done after Antoine's ResourceWarning patch.

Filip, do you want to investigate this?  We always have the option of ignoring the problem in 2.7, but since you reported it against 2.7 I'm guessing you'd like to see it fixed there.

I'm attaching a version of the patch that applies cleanly to 2.7 tip.
History
Date User Action Args
2013-03-13 21:20:45ezio.melottisetstage: patch review
2011-06-18 02:39:54r.david.murraysetfiles: + mailbox_fd_loss.patch

dependencies: - mailbox.py proxy close method cannot be called multiple times
messages: + msg138566
2011-06-17 18:00:25r.david.murraysetmessages: + msg138537
2011-06-17 17:44:58gruszczysetmessages: + msg138534
2011-06-17 17:21:05r.david.murraysetdependencies: + mailbox.py proxy close method cannot be called multiple times
messages: + msg138526
2011-06-17 17:10:31python-devsetnosy: + python-dev
messages: + msg138525
2011-06-01 06:31:01terry.reedysetversions: - Python 2.6, Python 2.5
2011-04-08 02:12:16r.david.murraysetmessages: + msg133287
2011-04-07 21:14:27gruszczysetfiles: + 11767_3.patch

messages: + msg133260
2011-04-07 20:49:38r.david.murraysetmessages: + msg133256
2011-04-07 20:03:22gruszczysetfiles: + 11767_2.patch

messages: + msg133250
2011-04-07 00:47:44r.david.murraysetfiles: + 11767.patch

messages: + msg133186
2011-04-07 00:43:59r.david.murraysetfiles: - 11767.patch
2011-04-06 20:31:34gruszczysetfiles: + 11767.patch
keywords: + patch
messages: + msg133163
2011-04-06 20:18:51r.david.murraysetmessages: + msg133162
2011-04-06 19:59:44moyixsetmessages: + msg133160
2011-04-06 19:52:42gruszczysetnosy: + gruszczy
messages: + msg133159
2011-04-05 14:07:52pitrousetnosy: + r.david.murray
2011-04-04 23:11:08moyixcreate