classification
Title: SSL tests leak memory
Type: Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: janssen Nosy List: amaury.forgeotdarc, christian.heimes, giampaolo.rodola, gvanrossum, janssen
Priority: normal Keywords:

Created on 2007-11-19 21:16 by janssen, last changed 2008-01-06 22:29 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ssl_gc.patch christian.heimes, 2007-12-06 16:03
patch-4 janssen, 2007-12-14 16:19
unnamed janssen, 2007-12-14 22:09
Messages (41)
msg57667 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-11-19 21:16
I'm seeing leaks in the test_ssl run, after various socket.py changes. 
I'm looking into it.
msg57730 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-21 07:21
I don't see leaks when I run the tests with

$ ./python Lib/test/regrtest.py -R:: test_ssl.py
test_ssl
beginning 9 repetitions
123456789
.........
1 test OK.
[80034 refs]
msg57743 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-11-21 23:06
What platform?

Bill

On Nov 20, 2007 11:21 PM, Christian Heimes <report@bugs.python.org> wrote:
>
> Christian Heimes added the comment:
>
> I don't see leaks when I run the tests with
>
> $ ./python Lib/test/regrtest.py -R:: test_ssl.py
> test_ssl
> beginning 9 repetitions
> 123456789
> .........
> 1 test OK.
> [80034 refs]
>
> ----------
> nosy: +tiran
> priority:  -> normal
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
msg57748 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-22 04:18
Ubuntu Linux x86
msg58065 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-01 19:20
Are you sure that the SSL tests are still leaking memory? The tests
aren't leaking references on Windows nor on Linux.
msg58070 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-01 21:59
I'm seeing the leaks on my Leopard machine.  I haven't had a chance to look
into it yet.

On Dec 1, 2007 11:20 AM, Christian Heimes <report@bugs.python.org> wrote:

>
> Christian Heimes added the comment:
>
> Are you sure that the SSL tests are still leaking memory? The tests
> aren't leaking references on Windows nor on Linux.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
msg58240 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 14:06
I can reproduce the problem when I run the ssl tests with -unetwork.

I've used PYTHONDUMREFS and combinerefs.py to find the leaking objects.
I'm seeing lots of

0x9fd3e4c [1] SSLSocket
0x9fd2928 [1] ssl.SSLContext <ssl.SSLContext object at 0x9fd2928>
0x9fd5b74 [1] dict {'certfile': None, '_sslobj': <ssl.SSLContext object
at 0x9fd2928>, 'do_handshake_on_con
nect': True, 'ssl_version': 1, '_base': None, 'cert_reqs': 2,
'ca_certs': '/home/heimes/dev/python/py3k/Lib
/test/https_svn_python_org_root.pem', 'keyfile': None,
'suppress_ragged_eofs': True}

in the output.

I guess the problem is connected to::

    _ssl.sslwrap(self, server_side,
        keyfile, certfile,
        cert_reqs, ssl_version, ca_certs)

in Lib/ssl.py. I'm digging in
msg58242 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 14:55
I may have found the error. The PySSL_Type doesn't support GC but it
should in order to clean up self->Socket. 

I've started to fix it but I don't have time to work on the problem 'til
tonight. The patch causes a seg fault. I may have overlooked or
misunderstood something.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1210324800 (LWP 31486)]
0x08127d03 in gc_list_remove (node=0x86d1cac) at Modules/gcmodule.c:158
158             node->gc.gc_prev->gc.gc_next = node->gc.gc_next;
msg58243 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 15:04
I found the problem! I've to use PyObject_GC_New instead of PyObject_New
msg58244 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-06 15:21
I get a segfault when I run it like this:

$ ./python.exe Lib/test/regrtest.py -uall test_ssl
test_ssl
Segmentation fault
$ 

(Without -uall it passes.)
msg58245 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 15:26
New patch

The new patch doesn't cause a seg fault but it doesn't help. It
*increases* the number of reference leaks.

test_ssl leaked [1610, 1610] references, sum=3220
msg58246 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 16:03
The new patch works and fixes the ref leak. I had to move the call
self._real_close() from __del__ to PySSL_dealloc(). I don't know if you
are going to like it. :]
msg58247 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-06 16:06
I will look at this in an hour or so, after I bring Orlijn to school
and drive to work.
msg58249 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 18:36
Guido van Rossum wrote:
> I will look at this in an hour or so, after I bring Orlijn to school
> and drive to work.

I found two problems with the _real_close() call in PySSL_dealloc. I'm
digging into it now.

o = PyObject_CallMethod((PyObject*)self->Socket, "_real_close", "");

Christian
msg58250 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-06 18:41
Oops, I just committed revision 59394.
Please advise.
msg58252 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-06 19:08
I just reverted it. I put a bit of debugging in the call to
_real_close(), and even with Christian's corrections it fails miserably.
I prefer leaking.
msg58253 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-06 19:17
I'm giving up on this for now.  There are other weird bugs in the code,
e.g. this simple piece of code fails:

>>> x = urllib.urlopen("https://mail.google.com").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 463,
in read
    return self.readall()
  File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 473,
in readall
    data = self.read(DEFAULT_BUFFER_SIZE)
  File "/usr/local/google/home/guido/python/py3kd/Lib/io.py", line 466,
in read
    del b[n:]
TypeError: slice indices must be integers or None or have an __index__
method
[64813 refs]
>>>

The debugger tells me that 'n' contains a bytes object instead of an
int.  This appears due to the confused signature of _sslobj.read()
(which can behave as either read() or readinto()).  Its docstring is
wrong too.  We'll deal with this after 3.0a2 is released.
msg58255 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-06 19:45
Guido van Rossum wrote:
> I just reverted it. I put a bit of debugging in the call to
> _real_close(), and even with Christian's corrections it fails miserably.
> I prefer leaking.

I agree! I've not enough time to work on the patch. A working patch
requires some redesign of the ssl interface. It's definitely too late now.

Christian
msg58286 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-07 23:52
I'm still not sure what the problem is.  This code was working fine when I
committed it, no leaks, so something else must have changed under the
covers.  I don't believe that adding GC to the C code is the answer; it
should be automatically GC'd.  So I'd back that patch out.

I intend to wolf-fence the tests in the test module until I find a single
test that causes the leaks.  Sorry I haven't had time to look at this
earlier, but we're in the middle of a big presentation afternoon here at
work.

On Dec 6, 2007 11:45 AM, Christian Heimes <report@bugs.python.org> wrote:

>
> Christian Heimes added the comment:
>
> Guido van Rossum wrote:
> > I just reverted it. I put a bit of debugging in the call to
> > _real_close(), and even with Christian's corrections it fails miserably.
> > I prefer leaking.
>
> I agree! I've not enough time to work on the patch. A working patch
> requires some redesign of the ssl interface. It's definitely too late now.
>
> Christian
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
msg58287 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-08 00:00
Don't worry, I did back it out before releasing 3.0a2.

I believe I hacked your code a bit before checking it in (or after you
checked it in, can't remember).

Did you see my bug report with a TypeError?
msg58372 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-10 22:05
I think I've figured it out.  My initial patch to socket.py and ssl.py
had an extra method defined on socket.socket, _real_close(), which did'
the cleanup work of deallocating the underlying socket, and in the SSL
subclass, of releasing the SSL context.  Guido must have removed that
method, and folded what it does into the socket.close() call.  But that
means that subclasses can't hook onto that code, to get cleaned up
when the socket is truly closed.  So the SSL context gets leaked.

Adding _real_close() back to the socket module fixes most of the leak,
but not all of it.  I'm looking for the other problem...
msg58378 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-10 22:37
So, what's the final status of __del__ in py3K?  The other bit of leak
is due to _real_close() not being called when a socket is dropped on the
floor (say, you try to connect, fail, and raise the exception back to
the caller, without ever explicitly calling close() on the opened
socket).  It looks like the __del__ method in SSLSocket isn't being called.
msg58379 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-12-10 22:51
Is it possible that you have the same problem as in
http://bugs.python.org/issue1540 ?

To be sure, check the gc.garbage variable at the end of the test. It may
contain the references you are looking for...
msg58382 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-10 23:17
> So, what's the final status of __del__ in py3K?  The other bit of leak
> is due to _real_close() not being called when a socket is dropped on the
> floor (say, you try to connect, fail, and raise the exception back to
> the caller, without ever explicitly calling close() on the opened
> socket).  It looks like the __del__ method in SSLSocket isn't being called.

There are no current plans to change __del__ in Py3k.
msg58388 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-10 23:30
The other leak comes from this code:

   s = ssl.wrap_socket(socket.socket(), ...)
   s.connect((SOME BOGUS ADDRESS OR SERVER))

The connect() fails, and the SSLSocket "s"gets dropped on the floor,
but never seems to be GC'd, (or the GC never seems to call the __del__
method).  Any idea why?

Could this be because the base class is a C class?
msg58389 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-10 23:35
gc.garbage is always empty.
msg58390 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-10 23:37
I wonder if Christian Heimes was correct that the ssl object needs GC
support? This was part of his patch (which I checked in and then
reverted because the other part of it didn't work as advertised).

Alternatively, if 's' is involved in a cycle *and* any of the objects
in the cycle has a __del__ method, the GC won't call __del__. This
seems counter-intuitive but is actually correct since __del__ can only
be called when there are no references to an object left.

--Guido

On Dec 10, 2007 3:30 PM, Bill Janssen <report@bugs.python.org> wrote:
>
> Bill Janssen added the comment:
>
> The other leak comes from this code:
>
>    s = ssl.wrap_socket(socket.socket(), ...)
>    s.connect((SOME BOGUS ADDRESS OR SERVER))
>
> The connect() fails, and the SSLSocket "s"gets dropped on the floor,
> but never seems to be GC'd, (or the GC never seems to call the __del__
> method).  Any idea why?
>
> Could this be because the base class is a C class?
>
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
msg58396 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-11 00:38
Ah, I see what's going on.  The revision of the socket code (nice job,
by the way) removed the distinction between the C socket object and the
Python socket object.  The C SSLContext keeps a pointer to the C socket
object, which is now the Python socket object, or in the SSL case, the
SSLSocket object.  So there's a circular reference.

The right fix is to make the ref in the SSL C code a weakref.
msg58397 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-11 00:42
Guido van Rossum wrote:
> Guido van Rossum added the comment:
> 
> I wonder if Christian Heimes was correct that the ssl object needs GC
> support? This was part of his patch (which I checked in and then
> reverted because the other part of it didn't work as advertised).
> 
> Alternatively, if 's' is involved in a cycle *and* any of the objects
> in the cycle has a __del__ method, the GC won't call __del__. This
> seems counter-intuitive but is actually correct since __del__ can only
> be called when there are no references to an object left.

A combination of GC support and the removal of __del__ from SSLSocket
has fixed the reference leak problem for me. I tried to move the
_real_clean() call to PySSL_dealloc but it wasn't as easy as I thought.

The code contains a reference cycle. ssl.SSLSocket() contains a
reference to PySSLObject and PySSLObject->Socket is a reference to the
same ssl.SSLSocket instance.

Christian
msg58402 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-11 01:26
I think Christian analysis is right, in that it takes a bit of GC
support, but not perhaps in the specifics of his approach.  I've done
two things to fix this:

1)  Put _real_close() back in socket.py, and then override it in
ssl.SSLSocket to release the SSLContext, and

2)  Change the pointer to the SSLSocket in _ssl.c to a weakref.

Now SSLSockets get cleaned up properly by the GC system.

One question: _real_close() is what Java calls a "protected" method.  Do
we have any way to mark this with annotations or metaclass stuff?
msg58404 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-11 01:58
Here's a patch -- take a look and let me know.

I also added a real asyncore server test.
msg58412 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-11 03:51
Now, about the "confused signature" of SSLSocket.read().

I'm not sure how confused it is.  It's sui generis; SSLSocket doesn't
inherit from some other class with a different read() method with a
different signature.  But we could change the name, perhaps to
"readbytes", or something similar?  I wrote it the way I did to keep
some kind of compatibility with pre-3.0 uses of the SSL module.

Anyway, the error is coming from a bug in the implementation of
'suppress_ragged_eofs'.  The code in SSLSocket.read() that handles the
EOF error needs to return the right kind of value.  I've added it to the
patch.
msg58580 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-13 20:13
Hm, when I run the full test_ssl test suite with

./python Lib/test/regrtest.py -v -uall test_ssl

it always hangs in testAsyncoreServer after printing this:

testAsyncoreServer (test.test_ssl.ThreadedTests) ...
 server:  new connection from 127.0.0.1:59781
 client:  sending 'FOO\n'...
 server:  read 4 from client
msg58590 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-12-13 21:19
It hangs on my Linux box, too. I've not yet tried on Windows.
msg58608 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-14 02:08
Good catch.  I flipped a bit cleaning up the C code.

Here's a fixed patch.
msg58612 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-14 05:02
> Good catch.  I flipped a bit cleaning up the C code.
>
> Here's a fixed patch.
>
> Added file: http://bugs.python.org/file8949/patch-3

Great! Go ahead and check it in. Sorry for deleting the _real_close()
earlier BTW.

Enjoy your time away!
msg58613 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-14 05:06
I spoke too soon. In a debug build, this hangs forever during the
second iteration:

./python.exe Lib/test/regrtest.py -uall -R1:1 test_ssl

Adding -v, it looks like two iterations are carried out perfectly (one
must be a trial run, one the warm-up run), but the third run goes
beserk; the output ends like this:

testAsyncoreServer (test.test_ssl.ThreadedTests) ...
 server:  read b'over\n' from client
 server:  closed connection <ssl.SSLSocket object, fd=6, family=2,
type=1, proto=0>
 server:  read b'' from client
 server:  closed connection <ssl.SSLSocket object, fd=10, family=2,
type=1, proto=0>
 server:  read b'' from client
 server:  closed connection <ssl.SSLSocket object, fd=10, family=2,
type=1, proto=0>
.
. (the last two lines repeated forever)
.
msg58629 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-14 16:15
The server isn't handling the close event properly.  I'll fix that.

On Dec 13, 2007 9:06 PM, Guido van Rossum <report@bugs.python.org> wrote:

>
> Guido van Rossum added the comment:
>
> I spoke too soon. In a debug build, this hangs forever during the
> second iteration:
>
> ./python.exe Lib/test/regrtest.py -uall -R1:1 test_ssl
>
> Adding -v, it looks like two iterations are carried out perfectly (one
> must be a trial run, one the warm-up run), but the third run goes
> beserk; the output ends like this:
>
> testAsyncoreServer (test.test_ssl.ThreadedTests) ...
>  server:  read b'over\n' from client
>  server:  closed connection <ssl.SSLSocket object, fd=6, family=2,
> type=1, proto=0>
>  server:  read b'' from client
>  server:  closed connection <ssl.SSLSocket object, fd=10, family=2,
> type=1, proto=0>
>  server:  read b'' from client
>  server:  closed connection <ssl.SSLSocket object, fd=10, family=2,
> type=1, proto=0>
> .
> . (the last two lines repeated forever)
> .
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
msg58630 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-14 16:19
Here's an update version where the asyncore test server properly handles
the EOF race condition.
msg58632 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-12-14 17:44
> Here's an update version where the asyncore test server properly handles
> the EOF race condition.

Perfect! Check it in.
msg58638 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-12-14 22:09
Done.

On Dec 14, 2007 9:44 AM, Guido van Rossum <report@bugs.python.org> wrote:

>
> Guido van Rossum added the comment:
>
> > Here's an update version where the asyncore test server properly handles
> > the EOF race condition.
>
> Perfect! Check it in.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1469>
> __________________________________
>
History
Date User Action Args
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-12-14 22:10:13gvanrossumsetstatus: open -> closed
resolution: fixed
2007-12-14 22:09:20janssensetfiles: + unnamed
messages: + msg58638
2007-12-14 17:44:48gvanrossumsetmessages: + msg58632
2007-12-14 16:19:56janssensetfiles: - unnamed
2007-12-14 16:19:35janssensetfiles: - patch-3
2007-12-14 16:19:15janssensetfiles: + patch-4
messages: + msg58630
2007-12-14 16:15:15janssensetfiles: + unnamed
messages: + msg58629
2007-12-14 05:06:04gvanrossumsetmessages: + msg58613
2007-12-14 05:02:23gvanrossumsetmessages: + msg58612
2007-12-14 02:08:48janssensetfiles: - patch-2
2007-12-14 02:08:16janssensetfiles: + patch-3
messages: + msg58608
2007-12-13 21:19:14christian.heimessetmessages: + msg58590
2007-12-13 20:13:10gvanrossumsetmessages: + msg58580
2007-12-13 01:30:09giampaolo.rodolasetnosy: + giampaolo.rodola
2007-12-11 03:52:05janssensetfiles: - a
2007-12-11 03:51:58janssensetfiles: + patch-2
messages: + msg58412
2007-12-11 01:58:08janssensetfiles: + a
messages: + msg58404
2007-12-11 01:26:55janssensetmessages: + msg58402
2007-12-11 00:42:04christian.heimessetmessages: + msg58397
2007-12-11 00:38:19janssensetmessages: + msg58396
2007-12-10 23:37:15gvanrossumsetmessages: + msg58390
2007-12-10 23:35:42janssensetmessages: + msg58389
2007-12-10 23:30:55janssensetmessages: + msg58388
2007-12-10 23:17:09gvanrossumsetmessages: + msg58382
2007-12-10 22:51:19amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg58379
2007-12-10 22:37:46janssensetmessages: + msg58378
2007-12-10 22:05:25janssensetmessages: + msg58372
2007-12-08 00:00:44gvanrossumsetmessages: + msg58287
2007-12-07 23:59:21gvanrossumsetfiles: - unnamed
2007-12-07 23:59:18gvanrossumsetfiles: - unnamed
2007-12-07 23:52:36janssensetfiles: + unnamed
messages: + msg58286
2007-12-06 19:45:00christian.heimessetmessages: + msg58255
2007-12-06 19:17:19gvanrossumsetpriority: release blocker -> normal
assignee: christian.heimes -> janssen
messages: + msg58253
2007-12-06 19:08:00gvanrossumsetmessages: + msg58252
2007-12-06 18:41:15gvanrossumsetpriority: high -> release blocker
messages: + msg58250
2007-12-06 18:36:38christian.heimessetmessages: + msg58249
2007-12-06 16:06:03gvanrossumsetmessages: + msg58247
2007-12-06 16:03:16christian.heimessetfiles: + ssl_gc.patch
messages: + msg58246
2007-12-06 16:01:37christian.heimessetfiles: - ssl_gc.patch
2007-12-06 15:26:02christian.heimessetfiles: + ssl_gc.patch
messages: + msg58245
2007-12-06 15:24:10christian.heimessetfiles: - ssl_gc.patch
2007-12-06 15:21:10gvanrossumsetmessages: + msg58244
2007-12-06 15:04:42christian.heimessetmessages: + msg58243
2007-12-06 14:55:29christian.heimessetfiles: + ssl_gc.patch
messages: + msg58242
2007-12-06 14:06:53christian.heimessetpriority: normal -> high
assignee: janssen -> christian.heimes
messages: + msg58240
nosy: + gvanrossum
2007-12-01 21:59:34janssensetfiles: + unnamed
messages: + msg58070
2007-12-01 19:20:25christian.heimessetmessages: + msg58065
2007-11-22 04:18:35christian.heimessetmessages: + msg57748
2007-11-21 23:06:24janssensetmessages: + msg57743
2007-11-21 07:21:35christian.heimessetpriority: normal
nosy: + christian.heimes
messages: + msg57730
2007-11-19 21:16:14janssencreate