msg93570 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-10-05 01:40 |
test_multiprocessing is producing tracebacks and hanging on py3k trunk.
I think this started happening fairly recently, as it seemed to work in
one not-too-old checkout until I did an svn up and then it started
failing. 3.1 seems fine. I've attached the tracebacks, including the
tracebacks I get when I ctrl-C to break the hang. My system is Gentoo
Linux.
|
msg93571 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-10-05 02:14 |
r75241 seems to be the cause. Tests passes on r75240, but gives the
dict error and hang on r75241.
|
msg93601 - (view) |
Author: Jesse Noller (jnoller) * |
Date: 2009-10-05 13:18 |
Adding gps - I haven't had a chance to grok the changes, and me getting to
a fix is going to take a few.
|
msg93605 - (view) |
Author: Jesse Noller (jnoller) * |
Date: 2009-10-05 15:16 |
It's passing for me on trunk/py3k at least on OS/X - I'm going to need to
dredge up a linuxvm
|
msg93815 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-10-10 00:55 |
Benjamin tried reverting the weakref patch, but that didn't fix it.
Turns out the problem is the other patch in that merge, that adds saving
of the exception to AssertRaises. Adding Kristjan as that was his patch.
(The tests pass with the weakref patch applied and the unittest patch
reverted, by the way.)
|
msg93825 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-10-10 11:04 |
Interesting. I don't have a linux machine to debug this. My patch
appears harmless enough. We are only storing an exception _object_, not
any tracebacks or such.
If this were happening on my windows machine I would put in breakpoints
where the ignored assertion is being raised and find out what's going on.
|
msg93854 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-10-11 12:03 |
It was a bad idea to revert the change, because we may now forget about
the problem rather than fix it.
It is likely that the "dictionary changed size during iteration" occurs
because a GC collection gets triggered during the iteration on the
weakdict. A GC collection can destroy some objects in that case because
keeping an exception around (which the unittest change does) usually
creates reference cycles through the frame's locals.
It seems that WeakValueDictionary.items() isn't very robust in py3k ;)
We probably need to add list-returning variants, e.g. a listitems()
method (and, probably, listkeys() and listvalues() too)?
|
msg93855 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-10-11 12:06 |
The unittest only keeps an exception _object_ around, not the associated
traceback. There should be no frames and thus reference cycles associated
with this.
I agree that the problem should be tackled, rather than swept under the
carpet like a revert does.
|
msg93856 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-10-11 12:14 |
> The unittest only keeps an exception _object_ around, not the associated
> traceback.
In py3k, the traceback is attached to the exception (on the
__traceback__ attribute) ;)
|
msg93857 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2009-10-11 12:32 |
I disagree about reverting it. We have this bug report, and that change
is sitting on the merge queue again...
|
msg93859 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-10-11 14:08 |
I didn't realize that the traceback was attached to the exception object
in py3k. This makes the use of such objects more dangerous because of the
circular references. The recommended practice of
exc_type, exc_obj = sys.exc_info()[:2]
is precisely to avoid any "accidents" with the tracebacks.
See also http://mail.python.org/pipermail/python-dev/2005-
August/055251.html
So, can one just clear the __traceback__ member? That would make sense
for the unittests, but it would make this manifestation of a GC bug go
into hiding again.
|
msg94413 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-10-24 12:59 |
Kristján:
> So, can one just clear the __traceback__ member?
Yes, or use `exc_value.with_traceback(None)`.
|
msg95173 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2009-11-13 02:37 |
If this is not ported by the time we get to a 2.7 or 3.2 release, I'd
like to back it out.
|
msg95175 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 09:16 |
I can easily port it again, and remove the traceback from the exception.
But won't we be sweeping that GC problem under the carpet then?
|
msg95178 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 15:04 |
Please note there's a patch in issue7105 to allow safer iteration of
weak dicts.
|
msg95182 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 16:10 |
Ok, I ported the change again in revision 76238, removing the traceback
from the stored exception object.
|
msg95183 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 16:12 |
> Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:
>
> Ok, I ported the change again in revision 76238, removing the traceback
> from the stored exception object.
"versionchanged" should be 3.2, not 3.1.
Thanks!
|
msg95184 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 16:16 |
Thanks. Fixed in revision 76239.
|
msg95188 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 17:04 |
I don't know if it's your last change, but test_multiprocessing is now
hanging on all buildbots.
|
msg95194 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-11-13 17:35 |
It is. TestZZZNumberOfObjects hangs at r76238, and does not hang at
r76237 on my Gentoo Linux box.
|
msg95195 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 18:04 |
It seems the error is different, though. This is what I get after Ctrl-C
(I omit the tracebacks from the other processes):
Traceback (most recent call last):
File "/home/antoine/py3k/__svn__/Lib/test/test_multiprocessing.py",
line 1071, in test_number_of_objects
refs = self.manager._number_of_objects()
File "Z:\py3k\__svn__\lib\multiprocessing\managers.py", line 569, in
_number_of_objects
conn = self._Client(self._address, authkey=self._authkey)
File "Z:\py3k\__svn__\lib\multiprocessing\connection.py", line 134, in
Client
c = SocketClient(address)
File "Z:\py3k\__svn__\lib\multiprocessing\connection.py", line 253, in
SocketClient
s.connect(address)
socket.error: [Errno 2] No such file or directory
|
msg95196 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 19:04 |
Ok, the problem is that a finalizer calls _decref which tries to open an
XMLRPC connection (!) to an address on which nobody listens anymore.
Then stupid SocketClient (in multiprocessing.connection) loops endlessly
because it is designed to do so when it gets ECONNREFUSED.
(when you Ctrl-C, you don't see all this because it happens in a
finalizer, and only a terse message is displayed)
So there are two problems:
- why does test_multiprocessing think it should open an XMLRPC
connection to a closed endpoint
- why doesn't SocketClient have some kind of timeout rather than looping
stupidly
|
msg95203 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 21:39 |
Yes, this is quite different.
After seeing this, I created issue 7314.
I didn't realize that it had anything to do with my patch but rather
blamed it on my brand new windows 7.
|
msg95204 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-13 21:41 |
> Yes, this is quite different.
> After seeing this, I created issue 7314.
> I didn't realize that it had anything to do with my patch but rather
> blamed it on my brand new windows 7.
I don't think it has anything to do with your patch. While diagnosing
the problem, I noticed that adding some random print() lines would
sometimes make the problem disappear. So it is really timing-dependent
and your patch happens to trigger the right (or wrong ;-)) timing.
|
msg95205 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 21:42 |
See my analysis from issue 7314:
The other processes are all waiting to ReadFile(). Maybe they have yet
to run to the point where they open the socket to which the master
process is trying to connect to.
The mystery is also: Why does the harmless act of setting an instance
attribute on a context manager (and without any traceback) trigger this
behaviour?
|
msg95206 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) * |
Date: 2009-11-13 21:44 |
Ah, okay then. Let's focus on solving 7414 then.
|
msg95372 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-11-17 10:07 |
I don't think this is a blocker anymore. Kristján's patch has been
ported and the test_multiprocessing freezes have been "fixed" by adding
a timeout.
As for the original purpose of this ticket, issue7105 is still (part
of?) the answer.
|
msg203249 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-11-18 08:13 |
I never seen this issue, can we close it?
#7105 has been fixed in Python 3, not in Python 2.7.
|
msg205226 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-12-04 15:09 |
Okay, let's say it is fixed. Adding Richard to nosy so that he can review the issue if he's interested.
|
msg320627 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-27 21:30 |
> RuntimeError: dictionary changed size during iteration
I didn't see this bug the last 5 years, so I consider that the issue has been fixed in the meanwhile :-)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:53 | admin | set | github: 51309 |
2018-06-27 21:30:55 | vstinner | set | status: open -> closed resolution: out of date -> fixed messages:
+ msg320627
|
2013-12-04 15:09:49 | pitrou | set | resolution: out of date messages:
+ msg205226 stage: needs patch -> resolved |
2013-11-18 08:13:54 | vstinner | set | nosy:
+ vstinner messages:
+ msg203249
|
2010-08-27 14:13:22 | asksol | set | nosy:
+ asksol
|
2009-11-17 10:07:19 | pitrou | set | priority: release blocker -> normal
messages:
+ msg95372 |
2009-11-13 21:44:04 | kristjan.jonsson | set | messages:
+ msg95206 |
2009-11-13 21:42:47 | kristjan.jonsson | set | messages:
+ msg95205 |
2009-11-13 21:41:24 | pitrou | set | messages:
+ msg95204 |
2009-11-13 21:39:10 | kristjan.jonsson | set | messages:
+ msg95203 |
2009-11-13 19:29:14 | pitrou | set | dependencies:
+ multiprocessing should not wait endlessly |
2009-11-13 19:04:46 | pitrou | set | messages:
+ msg95196 |
2009-11-13 18:04:33 | pitrou | set | messages:
+ msg95195 |
2009-11-13 17:35:39 | r.david.murray | set | messages:
+ msg95194 |
2009-11-13 17:35:22 | r.david.murray | set | messages:
- msg95193 |
2009-11-13 17:34:57 | r.david.murray | set | messages:
+ msg95193 |
2009-11-13 17:34:35 | r.david.murray | set | messages:
- msg95190 |
2009-11-13 17:26:16 | r.david.murray | set | messages:
+ msg95190 |
2009-11-13 17:04:40 | pitrou | set | messages:
+ msg95188 |
2009-11-13 16:16:30 | kristjan.jonsson | set | messages:
+ msg95184 |
2009-11-13 16:12:57 | pitrou | set | messages:
+ msg95183 |
2009-11-13 16:10:58 | kristjan.jonsson | set | messages:
+ msg95182 |
2009-11-13 15:04:44 | pitrou | set | messages:
+ msg95178 |
2009-11-13 09:16:22 | kristjan.jonsson | set | messages:
+ msg95175 |
2009-11-13 02:37:21 | benjamin.peterson | set | priority: normal -> release blocker
messages:
+ msg95173 |
2009-10-24 12:59:50 | pitrou | set | messages:
+ msg94413 |
2009-10-11 17:32:42 | jon | set | nosy:
+ jon
|
2009-10-11 17:29:50 | pitrou | set | dependencies:
+ weak dict iterators are fragile because of unpredictable GC runs |
2009-10-11 14:08:33 | kristjan.jonsson | set | messages:
+ msg93859 |
2009-10-11 12:32:10 | benjamin.peterson | set | messages:
+ msg93857 |
2009-10-11 12:14:41 | pitrou | set | messages:
+ msg93856 |
2009-10-11 12:06:47 | kristjan.jonsson | set | messages:
+ msg93855 |
2009-10-11 12:03:10 | pitrou | set | nosy:
+ pitrou, benjamin.peterson messages:
+ msg93854
|
2009-10-10 11:04:36 | kristjan.jonsson | set | messages:
+ msg93825 |
2009-10-10 00:55:10 | r.david.murray | set | nosy:
+ kristjan.jonsson messages:
+ msg93815
|
2009-10-05 15:16:21 | jnoller | set | messages:
+ msg93605 |
2009-10-05 13:18:11 | jnoller | set | nosy:
+ gregory.p.smith messages:
+ msg93601
|
2009-10-05 02:14:53 | r.david.murray | set | messages:
+ msg93571 |
2009-10-05 01:41:17 | r.david.murray | set | nosy:
jnoller, r.david.murray components:
+ Library (Lib) |
2009-10-05 01:40:55 | r.david.murray | create | |