Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maildir iterator leaks file descriptors by default #55976

Closed
moyix mannequin opened this issue Apr 4, 2011 · 16 comments
Closed

Maildir iterator leaks file descriptors by default #55976

moyix mannequin opened this issue Apr 4, 2011 · 16 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@moyix
Copy link
Mannequin

moyix mannequin commented Apr 4, 2011

BPO 11767
Nosy @bitdancer, @serhiy-storchaka
Files
  • 11767.patch: Reattaching Filip's patch after accidental delete
  • 11767_2.patch
  • 11767_3.patch
  • mailbox_fd_loss.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-31.12:55:07.316>
    created_at = <Date 2011-04-04.23:11:08.802>
    labels = ['type-bug', 'library']
    title = 'Maildir iterator leaks file descriptors by default'
    updated_at = <Date 2020-05-31.12:55:07.316>
    user = 'https://bugs.python.org/moyix'

    bugs.python.org fields:

    activity = <Date 2020-05-31.12:55:07.316>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-31.12:55:07.316>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2011-04-04.23:11:08.802>
    creator = 'moyix'
    dependencies = []
    files = ['21557', '21569', '21572', '22400']
    hgrepos = []
    issue_num = 11767
    keywords = ['patch']
    message_count = 16.0
    messages = ['132988', '133159', '133160', '133162', '133163', '133186', '133250', '133256', '133260', '133287', '138525', '138526', '138534', '138537', '138566', '370434']
    nosy_count = 5.0
    nosy_names = ['r.david.murray', 'gruszczy', 'python-dev', 'moyix', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue11767'
    versions = ['Python 2.7']

    @moyix
    Copy link
    Mannequin Author

    moyix mannequin commented Apr 4, 2011

    The default constructor for Maildir is rfc822.Message. This means that when iterating over a Maildir mailbox instantiated with default settings, the Mailbox class will return self._factory(self.get_file(key)), leaking the file descriptor returned by get_file(). Thus, iterating over any reasonably sized maildir mailbox will cause file descriptors to be leaked, and if a non-refcounting GC is used (as in PyPy), it is very likely that the process will run out of file descriptors.

    I see that the default has changed to None, for Py3k, but this still means that using a message factory unavoidably leaks file descriptors. Ideally, I'd like the Mailbox class to close the file descriptor passed into the factory; this would mean that file descriptors are never leaked during iteration.

    @moyix moyix mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 4, 2011
    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Apr 6, 2011

    Shouldn't this be your responsibility to close this descriptor in the object used as self._factory? When self._factory is called it should read from the file like object and then simply close it, just as get_message (when self._factory is not provided) does.

    @moyix
    Copy link
    Mannequin Author

    moyix mannequin commented Apr 6, 2011

    The _factory object didn't open the file, so why should it close it?

    It's also worth noting that things one would naturally consider using as factories (like email.message_from_file) don't close the fd when they're done. I will grant that you could easily create a wrapper factory that does close the descriptor in this case, but that's not the obvious thing to do.

    @bitdancer
    Copy link
    Member

    It is not clear from the docs what the expected behavior of factory is. It is certainly the case that the custom classes used by mailbox by default do not close the file they are passed. So either those classes should close the input file and factory should be so documented, or all the methods that create a message object should close the input file when they create one. Since the existing get_message methods take care to close any opened input files, I favor the latter (ie: the OP's suggestion).

    Anyone care to propose a patch?

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Apr 6, 2011

    How about this? Though I have no idea, how a test could be created for this. If someone could advise me on that, I'll be happy to write tests/work further on the patch.

    @bitdancer
    Copy link
    Member

    get_file's promise is that what is returned is a file like object, so it not having a close() method would be an error. So I don't think you need the try/except. What I would suggest is to use the 'closing' context manager around the return statement.

    As for tests, you could create a subclass with a custom get_file method that returns a mock object you can test to make sure the close method gets called, since Mailbox is the only place __getitem__ is defined.

    Gah, I hit the wrong key an deleted your patch. Reattaching.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Apr 7, 2011

    Here is a new patch, that uses with in __getitem__. I wonder, if we shouldn't check for an AttributeError in case object returned by get_file doesn't have __exit__ method, but does have close and use close instead of with. But it's for you to decide, as I am very fresh to with statement.

    @bitdancer
    Copy link
    Member

    Ah, that's exactly why I suggested using the 'closing' context manager from contextlib. That context manager returns the object passed to it, and then when its __exit__ method is called, calls the close method of that object that was passed to it.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Apr 7, 2011

    I am sorry. This is the first time I see contextlib and haven't understood, that I should use a function from it. Here is a version with mock object having close method and __getitem__ using contextlib.closing.

    @bitdancer
    Copy link
    Member

    I shouldn't have assumed that you knew about contextlib, and should have mentioned it by name the first time.

    The patch looks good to me. Not sure it is necessary to loop through ten fake mailboxes, but it doesn't hurt, either. (Other potential reviewers note: 'looks' is the operative word, I haven't tested it myself yet.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 17, 2011

    New changeset fea1920ae75f by R David Murray in branch '3.2':
    bpo-11767: use context manager to close file in __getitem__ to prevent FD leak
    http://hg.python.org/cpython/rev/fea1920ae75f

    New changeset 1d7a91358517 by R David Murray in branch 'default':
    merge bpo-11767: use context manager to close file in __getitem__ to prevent FD leak
    http://hg.python.org/cpython/rev/1d7a91358517

    @bitdancer
    Copy link
    Member

    The patch doesn't work on 2.7. The failures are related to bpo-11700, although that would seem to indicate that something *is* trying to close the file. I have no idea what.

    @gruszczy
    Copy link
    Mannequin

    gruszczy mannequin commented Jun 17, 2011

    Should I try to do something about this right now or should I wait until bpo-1170 is finished&closed and only then try to fix this issue too?

    @bitdancer
    Copy link
    Member

    Well, if you want to you could investigate further, and try the patch from bpo-11700 and see what is left to do here after it has been applied. I'll try to get 11700 in soon, though.

    @bitdancer
    Copy link
    Member

    Well, it turns out I was totally wrong about the 11700 dependency. I misread the errors that were produced by the test suite. Even after fixing 11700 they are still there: the tests are reading from the closed files. So something changed in the logic between 2.7 and 3.x. It is possible that this is related to the addition of context manager support, and is almost certainly related to the general cleanup of unclosed files that was done after Antoine's ResourceWarning patch.

    Filip, do you want to investigate this? We always have the option of ignoring the problem in 2.7, but since you reported it against 2.7 I'm guessing you'd like to see it fixed there.

    I'm attaching a version of the patch that applies cleanly to 2.7 tip.

    @serhiy-storchaka
    Copy link
    Member

    Python 2.7 is no longer supported.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants