Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL tests leak memory #45810

Closed
janssen mannequin opened this issue Nov 19, 2007 · 41 comments
Closed

SSL tests leak memory #45810

janssen mannequin opened this issue Nov 19, 2007 · 41 comments
Labels
stdlib Python modules in the Lib dir

Comments

@janssen
Copy link
Mannequin

janssen mannequin commented Nov 19, 2007

BPO 1469
Nosy @gvanrossum, @amauryfa, @giampaolo, @tiran
Files
  • ssl_gc.patch
  • patch-4
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2007-12-14.22:10:13.586>
    created_at = <Date 2007-11-19.21:16:14.075>
    labels = ['library']
    title = 'SSL tests leak memory'
    updated_at = <Date 2008-01-06.22:29:45.066>
    user = 'https://bugs.python.org/janssen'

    bugs.python.org fields:

    activity = <Date 2008-01-06.22:29:45.066>
    actor = 'admin'
    assignee = 'janssen'
    closed = True
    closed_date = <Date 2007-12-14.22:10:13.586>
    closer = 'gvanrossum'
    components = ['Library (Lib)']
    creation = <Date 2007-11-19.21:16:14.075>
    creator = 'janssen'
    dependencies = []
    files = ['8886', '8955', '8956']
    hgrepos = []
    issue_num = 1469
    keywords = []
    message_count = 41.0
    messages = ['57667', '57730', '57743', '57748', '58065', '58070', '58240', '58242', '58243', '58244', '58245', '58246', '58247', '58249', '58250', '58252', '58253', '58255', '58286', '58287', '58372', '58378', '58379', '58382', '58388', '58389', '58390', '58396', '58397', '58402', '58404', '58412', '58580', '58590', '58608', '58612', '58613', '58629', '58630', '58632', '58638']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'amaury.forgeotdarc', 'janssen', 'giampaolo.rodola', 'christian.heimes']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1469'
    versions = ['Python 3.0']

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Nov 19, 2007

    I'm seeing leaks in the test_ssl run, after various socket.py changes.
    I'm looking into it.

    @janssen janssen mannequin self-assigned this Nov 19, 2007
    @janssen janssen mannequin added the stdlib Python modules in the Lib dir label Nov 19, 2007
    @tiran
    Copy link
    Member

    tiran commented Nov 21, 2007

    I don't see leaks when I run the tests with

    $ ./python Lib/test/regrtest.py -R:: test_ssl.py
    test_ssl
    beginning 9 repetitions
    123456789
    .........
    1 test OK.
    [80034 refs]

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Nov 21, 2007

    What platform?

    Bill

    On Nov 20, 2007 11:21 PM, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    I don't see leaks when I run the tests with

    $ ./python Lib/test/regrtest.py -R:: test_ssl.py
    test_ssl
    beginning 9 repetitions
    123456789
    .........
    1 test OK.
    [80034 refs]

    ----------
    nosy: +tiran
    priority: -> normal


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @tiran
    Copy link
    Member

    tiran commented Nov 22, 2007

    Ubuntu Linux x86

    @tiran
    Copy link
    Member

    tiran commented Dec 1, 2007

    Are you sure that the SSL tests are still leaking memory? The tests
    aren't leaking references on Windows nor on Linux.

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 1, 2007

    I'm seeing the leaks on my Leopard machine. I haven't had a chance to look
    into it yet.

    On Dec 1, 2007 11:20 AM, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    Are you sure that the SSL tests are still leaking memory? The tests
    aren't leaking references on Windows nor on Linux.


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    I can reproduce the problem when I run the ssl tests with -unetwork.

    I've used PYTHONDUMREFS and combinerefs.py to find the leaking objects.
    I'm seeing lots of

    0x9fd3e4c [1] SSLSocket
    0x9fd2928 [1] ssl.SSLContext <ssl.SSLContext object at 0x9fd2928>
    0x9fd5b74 [1] dict {'certfile': None, '_sslobj': <ssl.SSLContext object
    at 0x9fd2928>, 'do_handshake_on_con
    nect': True, 'ssl_version': 1, '_base': None, 'cert_reqs': 2,
    'ca_certs': '/home/heimes/dev/python/py3k/Lib
    /test/https_svn_python_org_root.pem', 'keyfile': None,
    'suppress_ragged_eofs': True}

    in the output.

    I guess the problem is connected to::

        _ssl.sslwrap(self, server_side,
            keyfile, certfile,
            cert_reqs, ssl_version, ca_certs)

    in Lib/ssl.py. I'm digging in

    @tiran tiran assigned tiran and unassigned janssen Dec 6, 2007
    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    I may have found the error. The PySSL_Type doesn't support GC but it
    should in order to clean up self->Socket.

    I've started to fix it but I don't have time to work on the problem 'til
    tonight. The patch causes a seg fault. I may have overlooked or
    misunderstood something.

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread -1210324800 (LWP 31486)]
    0x08127d03 in gc_list_remove (node=0x86d1cac) at Modules/gcmodule.c:158
    158 node->gc.gc_prev->gc.gc_next = node->gc.gc_next;

    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    I found the problem! I've to use PyObject_GC_New instead of PyObject_New

    @gvanrossum
    Copy link
    Member

    I get a segfault when I run it like this:

    $ ./python.exe Lib/test/regrtest.py -uall test_ssl
    test_ssl
    Segmentation fault
    $ 

    (Without -uall it passes.)

    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    New patch

    The new patch doesn't cause a seg fault but it doesn't help. It
    *increases* the number of reference leaks.

    test_ssl leaked [1610, 1610] references, sum=3220

    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    The new patch works and fixes the ref leak. I had to move the call
    self._real_close() from __del__ to PySSL_dealloc(). I don't know if you
    are going to like it. :]

    @gvanrossum
    Copy link
    Member

    I will look at this in an hour or so, after I bring Orlijn to school
    and drive to work.

    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    Guido van Rossum wrote:

    I will look at this in an hour or so, after I bring Orlijn to school
    and drive to work.

    I found two problems with the _real_close() call in PySSL_dealloc. I'm
    digging into it now.

    o = PyObject_CallMethod((PyObject*)self->Socket, "_real_close", "");

    Christian

    @gvanrossum
    Copy link
    Member

    Oops, I just committed revision 59394.
    Please advise.

    @gvanrossum
    Copy link
    Member

    I just reverted it. I put a bit of debugging in the call to
    _real_close(), and even with Christian's corrections it fails miserably.
    I prefer leaking.

    @gvanrossum
    Copy link
    Member

    I'm giving up on this for now. There are other weird bugs in the code,
    e.g. this simple piece of code fails:

    >>> x = urllib.urlopen("https://mail.google.com").read()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 463,
    in read
        return self.readall()
      File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 473,
    in readall
        data = self.read(DEFAULT_BUFFER_SIZE)
      File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 466,
    in read
        del b[n:]
    TypeError: slice indices must be integers or None or have an __index__
    method
    [64813 refs]
    >>>

    The debugger tells me that 'n' contains a bytes object instead of an
    int. This appears due to the confused signature of _sslobj.read()
    (which can behave as either read() or readinto()). Its docstring is
    wrong too. We'll deal with this after 3.0a2 is released.

    @gvanrossum gvanrossum assigned janssen and unassigned tiran Dec 6, 2007
    @tiran
    Copy link
    Member

    tiran commented Dec 6, 2007

    Guido van Rossum wrote:

    I just reverted it. I put a bit of debugging in the call to
    _real_close(), and even with Christian's corrections it fails miserably.
    I prefer leaking.

    I agree! I've not enough time to work on the patch. A working patch
    requires some redesign of the ssl interface. It's definitely too late now.

    Christian

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 7, 2007

    I'm still not sure what the problem is. This code was working fine when I
    committed it, no leaks, so something else must have changed under the
    covers. I don't believe that adding GC to the C code is the answer; it
    should be automatically GC'd. So I'd back that patch out.

    I intend to wolf-fence the tests in the test module until I find a single
    test that causes the leaks. Sorry I haven't had time to look at this
    earlier, but we're in the middle of a big presentation afternoon here at
    work.

    On Dec 6, 2007 11:45 AM, Christian Heimes <report@bugs.python.org> wrote:

    Christian Heimes added the comment:

    Guido van Rossum wrote:
    > I just reverted it. I put a bit of debugging in the call to
    > _real_close(), and even with Christian's corrections it fails miserably.
    > I prefer leaking.

    I agree! I've not enough time to work on the patch. A working patch
    requires some redesign of the ssl interface. It's definitely too late now.

    Christian


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @gvanrossum
    Copy link
    Member

    Don't worry, I did back it out before releasing 3.0a2.

    I believe I hacked your code a bit before checking it in (or after you
    checked it in, can't remember).

    Did you see my bug report with a TypeError?

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 10, 2007

    I think I've figured it out. My initial patch to socket.py and ssl.py
    had an extra method defined on socket.socket, _real_close(), which did'
    the cleanup work of deallocating the underlying socket, and in the SSL
    subclass, of releasing the SSL context. Guido must have removed that
    method, and folded what it does into the socket.close() call. But that
    means that subclasses can't hook onto that code, to get cleaned up
    when the socket is truly closed. So the SSL context gets leaked.

    Adding _real_close() back to the socket module fixes most of the leak,
    but not all of it. I'm looking for the other problem...

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 10, 2007

    So, what's the final status of __del__ in py3K? The other bit of leak
    is due to _real_close() not being called when a socket is dropped on the
    floor (say, you try to connect, fail, and raise the exception back to
    the caller, without ever explicitly calling close() on the opened
    socket). It looks like the __del__ method in SSLSocket isn't being called.

    @amauryfa
    Copy link
    Member

    Is it possible that you have the same problem as in
    http://bugs.python.org/issue1540 ?

    To be sure, check the gc.garbage variable at the end of the test. It may
    contain the references you are looking for...

    @gvanrossum
    Copy link
    Member

    So, what's the final status of __del__ in py3K? The other bit of leak
    is due to _real_close() not being called when a socket is dropped on the
    floor (say, you try to connect, fail, and raise the exception back to
    the caller, without ever explicitly calling close() on the opened
    socket). It looks like the __del__ method in SSLSocket isn't being called.

    There are no current plans to change __del__ in Py3k.

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 10, 2007

    The other leak comes from this code:

       s = ssl.wrap_socket(socket.socket(), ...)
       s.connect((SOME BOGUS ADDRESS OR SERVER))

    The connect() fails, and the SSLSocket "s"gets dropped on the floor,
    but never seems to be GC'd, (or the GC never seems to call the __del__
    method). Any idea why?

    Could this be because the base class is a C class?

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 10, 2007

    gc.garbage is always empty.

    @gvanrossum
    Copy link
    Member

    I wonder if Christian Heimes was correct that the ssl object needs GC
    support? This was part of his patch (which I checked in and then
    reverted because the other part of it didn't work as advertised).

    Alternatively, if 's' is involved in a cycle *and* any of the objects
    in the cycle has a __del__ method, the GC won't call __del__. This
    seems counter-intuitive but is actually correct since __del__ can only
    be called when there are no references to an object left.

    --Guido

    On Dec 10, 2007 3:30 PM, Bill Janssen <report@bugs.python.org> wrote:

    Bill Janssen added the comment:

    The other leak comes from this code:

    s = ssl.wrap_socket(socket.socket(), ...)
    s.connect((SOME BOGUS ADDRESS OR SERVER))

    The connect() fails, and the SSLSocket "s"gets dropped on the floor,
    but never seems to be GC'd, (or the GC never seems to call the __del__
    method). Any idea why?

    Could this be because the base class is a C class?


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 11, 2007

    Ah, I see what's going on. The revision of the socket code (nice job,
    by the way) removed the distinction between the C socket object and the
    Python socket object. The C SSLContext keeps a pointer to the C socket
    object, which is now the Python socket object, or in the SSL case, the
    SSLSocket object. So there's a circular reference.

    The right fix is to make the ref in the SSL C code a weakref.

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    Guido van Rossum wrote:

    Guido van Rossum added the comment:

    I wonder if Christian Heimes was correct that the ssl object needs GC
    support? This was part of his patch (which I checked in and then
    reverted because the other part of it didn't work as advertised).

    Alternatively, if 's' is involved in a cycle *and* any of the objects
    in the cycle has a __del__ method, the GC won't call __del__. This
    seems counter-intuitive but is actually correct since __del__ can only
    be called when there are no references to an object left.

    A combination of GC support and the removal of __del__ from SSLSocket
    has fixed the reference leak problem for me. I tried to move the
    _real_clean() call to PySSL_dealloc but it wasn't as easy as I thought.

    The code contains a reference cycle. ssl.SSLSocket() contains a
    reference to PySSLObject and PySSLObject->Socket is a reference to the
    same ssl.SSLSocket instance.

    Christian

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 11, 2007

    I think Christian analysis is right, in that it takes a bit of GC
    support, but not perhaps in the specifics of his approach. I've done
    two things to fix this:

    1. Put _real_close() back in socket.py, and then override it in
      ssl.SSLSocket to release the SSLContext, and

    2. Change the pointer to the SSLSocket in _ssl.c to a weakref.

    Now SSLSockets get cleaned up properly by the GC system.

    One question: _real_close() is what Java calls a "protected" method. Do
    we have any way to mark this with annotations or metaclass stuff?

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 11, 2007

    Here's a patch -- take a look and let me know.

    I also added a real asyncore server test.

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 11, 2007

    Now, about the "confused signature" of SSLSocket.read().

    I'm not sure how confused it is. It's sui generis; SSLSocket doesn't
    inherit from some other class with a different read() method with a
    different signature. But we could change the name, perhaps to
    "readbytes", or something similar? I wrote it the way I did to keep
    some kind of compatibility with pre-3.0 uses of the SSL module.

    Anyway, the error is coming from a bug in the implementation of
    'suppress_ragged_eofs'. The code in SSLSocket.read() that handles the
    EOF error needs to return the right kind of value. I've added it to the
    patch.

    @gvanrossum
    Copy link
    Member

    Hm, when I run the full test_ssl test suite with

    ./python Lib/test/regrtest.py -v -uall test_ssl

    it always hangs in testAsyncoreServer after printing this:

    testAsyncoreServer (test.test_ssl.ThreadedTests) ...
    server: new connection from 127.0.0.1:59781
    client: sending 'FOO\n'...
    server: read 4 from client

    @tiran
    Copy link
    Member

    tiran commented Dec 13, 2007

    It hangs on my Linux box, too. I've not yet tried on Windows.

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 14, 2007

    Good catch. I flipped a bit cleaning up the C code.

    Here's a fixed patch.

    @gvanrossum
    Copy link
    Member

    Good catch. I flipped a bit cleaning up the C code.

    Here's a fixed patch.

    Added file: http://bugs.python.org/file8949/patch-3

    Great! Go ahead and check it in. Sorry for deleting the _real_close()
    earlier BTW.

    Enjoy your time away!

    @gvanrossum
    Copy link
    Member

    I spoke too soon. In a debug build, this hangs forever during the
    second iteration:

    ./python.exe Lib/test/regrtest.py -uall -R1:1 test_ssl

    Adding -v, it looks like two iterations are carried out perfectly (one
    must be a trial run, one the warm-up run), but the third run goes
    beserk; the output ends like this:

    testAsyncoreServer (test.test_ssl.ThreadedTests) ...
    server: read b'over\n' from client
    server: closed connection <ssl.SSLSocket object, fd=6, family=2,
    type=1, proto=0>
    server: read b'' from client
    server: closed connection <ssl.SSLSocket object, fd=10, family=2,
    type=1, proto=0>
    server: read b'' from client
    server: closed connection <ssl.SSLSocket object, fd=10, family=2,
    type=1, proto=0>
    .
    . (the last two lines repeated forever)
    .

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 14, 2007

    The server isn't handling the close event properly. I'll fix that.

    On Dec 13, 2007 9:06 PM, Guido van Rossum <report@bugs.python.org> wrote:

    Guido van Rossum added the comment:

    I spoke too soon. In a debug build, this hangs forever during the
    second iteration:

    ./python.exe Lib/test/regrtest.py -uall -R1:1 test_ssl

    Adding -v, it looks like two iterations are carried out perfectly (one
    must be a trial run, one the warm-up run), but the third run goes
    beserk; the output ends like this:

    testAsyncoreServer (test.test_ssl.ThreadedTests) ...
    server: read b'over\n' from client
    server: closed connection <ssl.SSLSocket object, fd=6, family=2,
    type=1, proto=0>
    server: read b'' from client
    server: closed connection <ssl.SSLSocket object, fd=10, family=2,
    type=1, proto=0>
    server: read b'' from client
    server: closed connection <ssl.SSLSocket object, fd=10, family=2,
    type=1, proto=0>
    .
    . (the last two lines repeated forever)
    .


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 14, 2007

    Here's an update version where the asyncore test server properly handles
    the EOF race condition.

    @gvanrossum
    Copy link
    Member

    Here's an update version where the asyncore test server properly handles
    the EOF race condition.

    Perfect! Check it in.

    @janssen
    Copy link
    Mannequin Author

    janssen mannequin commented Dec 14, 2007

    Done.

    On Dec 14, 2007 9:44 AM, Guido van Rossum <report@bugs.python.org> wrote:

    Guido van Rossum added the comment:

    > Here's an update version where the asyncore test server properly handles
    > the EOF race condition.

    Perfect! Check it in.


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1469\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants