classification
Title: O(n**2) behaviour when adding/removing classes
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, haypo, kristjan.jonsson, pitrou, python-dev, rhettinger
Priority: normal Keywords: patch

Created on 2013-05-08 15:58 by kristjan.jonsson, last changed 2013-10-29 20:34 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
slowness.py kristjan.jonsson, 2013-05-08 15:58 demonstrates a problem.
subtype.patch kristjan.jonsson, 2013-05-08 15:59
subtype.patch kristjan.jonsson, 2013-05-11 14:32 review
subtype2.patch pitrou, 2013-05-13 09:39 review
subtype.patch kristjan.jonsson, 2013-05-13 17:29 review
subclasses_dict.patch pitrou, 2013-05-25 10:42 review
subclasses_dict2.patch pitrou, 2013-10-28 20:24 review
subclasses_dict3.patch pitrou, 2013-10-29 20:13 review
Messages (32)
msg188724 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-08 15:58
We came across this curious phenomenon, when our progam was leaking dynamically created classes.  It started spending CPU, to be fixed when gc  was increased.  The attached .py file demonstrates the problem.

The problem is due to how child classes are added to the parent class, in this case, "object".  Obsolete code tries to look for a NULL pointer in the entire list of children.

In addition, removing child classes is unnecessarily slow.

the attached patch fixes these performance issues.
msg188725 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-08 15:59
Adding the code change patch
msg188726 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-08 16:07
Funny, I had once noticed this theoretical problem, but it didn't seem to matter concretely, so I hadn't posted about it :)
msg188820 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-10 09:14
So, any objections?  It is pretty straighforward.  Btw, the searching for Py_NONE seems to be a rudiment of some older version, there is no place where a Py_NONE is put in there.

Also, what is the policy for 2.7?  Are performance bugs proper bugs?
msg188823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-10 09:35
Py_None is what PyWeakref_GET_OBJECT() returns when the object is dead.
(same as calling () on a weakref, really)

The patch looks straightforward to me. 2.7 doesn't receive performance fixes, though, except for large regressions. Also, we've had enough regressions in the last 2.7 bugfix release, IMHO.

(I had worked on a patch which replaced the list with a dict, but I had refcounting issues with it, and I'm not interested enough in the issue to push any further)
msg188825 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-10 09:53
Right, I misread the code.
Since the remove_subclass() function is called when a subclass dies, I don't think it is necessary to clear out weak references in add_subclass().  They are there merely to break the reference cycle.

Btw, the 'patch diff' stuff seems to be broken in the tracker ....
msg188827 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-10 10:10
> Right, I misread the code.
> Since the remove_subclass() function is called when a subclass dies,
> I don't think it is necessary to clear out weak references in
> add_subclass().  They are there merely to break the reference cycle.

Agreed.
msg188924 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-11 14:32
It turned out to be slightly more compilcated.  Two additions make this complete:
1) check for the subtype OR the Py_None when removing subclass.  This removes any dependency on the order in which weakrefs are cleared.
2) When the type is cleared, manually remove ourselves from all the base classes.

It is because of the lack of 2) that the original version was always clearing out all stale weakrefs when new subclasses were added.
msg189111 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 09:19
I think the logic is slightly wrong in remove_subclass. When you encounter Py_None, you can't be sure it's the weakref for the current type; theoretically, it could be any other one (depending on oddities in cleanup order, cycle collection, etc.). So you have to continue walking the list.
msg189113 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 09:39
Attaching an alternative implementation for remove_subclass().
msg189117 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-13 10:15
Basically the logic is this:  When the class goes away, it _always_ calls remove_subclass().  Now, this may be before or after the weakref has been clear so that it will either find itself in a weakref, or a clear weakref.

In case this logic is flawed, we know that when remove_subclass() is called, exactly one child is removed.  Whether it is us, or some previous class, is irrelevant.
msg189118 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 10:21
> In case this logic is flawed, we know that when remove_subclass() is
> called, exactly one child is removed.  Whether it is us, or some
> previous class, is irrelevant.

remove_subclass() is called when __bases__ is assigned to, so it is
not irrelevant:

>>> class A: pass
... 
>>> class B(A): pass
... 
>>> class C: pass
... 
>>> A.__subclasses__()
[<class '__main__.B'>]
>>> B.__bases__ = C,
>>> A.__subclasses__()
[]
>>> C.__subclasses__()
[<class '__main__.B'>]
msg189119 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-13 10:21
There are two cases when remove_subclass is called:
One when we are changing base classes(the original use of this function), and in this case we must find the correct one.

The second one is when the class is being deleted, for housekeeping of the weakrefs.
I worry that your alternative will cause us to walk the entire list in the second case because it will be called when the weakref has been cleared, so  it will never find itself in the list, only None.  I'll make some tests to verify.

I think perhaps a small adjustment, an "exact" flag or something, can be added to differentiate between the two cases....
msg189120 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 10:26
> The second one is when the class is being deleted, for housekeeping
> of the weakrefs.
> I worry that your alternative will cause us to walk the entire list
> in the second case because it will be called when the weakref has
> been cleared, so  it will never find itself in the list, only None.

I don't think that matters much. In both cases the expected complexity
is O(n).
msg189121 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-13 10:33
Actually, in a program that dynamically creates a class, and then deletes it, you would expect a O(1) complexity.  adding children at the end, and searching from the end, is a way to achieve this.
While I admit that I oversaw the exact requirement for __bases__, I think that allowing for a "None" to be sufficient when the class is being deleted is an important improvement.  I realize that the therotetical worst case is O(n), but the practical common case is still O(1)
msg189122 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 11:06
> While I admit that I oversaw the exact requirement for __bases__, I think
> that allowing for a "None" to be sufficient when the class is being deleted 
> is an important improvement.

You can't be sure the "None" corresponds to the type being deleted and not another one. If you want guaranteed (almost :-)) O(1) behaviour, you must switch to a dict.
msg189124 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-13 11:40
Yes I am sure.  Please see the previous reasoning.  Igoring assigning to __bases__ for the moment...

Every class deletion will try to remove itself from the list.  Either it will 
a) find an existing reference in the weakref and remove it
b) find a None in the weakref and remove it.

b) is guaranteed to be the old weakref, because every previous class removal has also succeeded.  At any time, there is at most a single stale weakref in the list.

And _even_if_ there is a flaw in the argument, every call will remove _one_ stale (or just about to become stale) weakref from the list.  Whether it is its own weakref or that of a previous deletion is immaterial.  The removal of _one_ entry for each class deletion keeps the list at its optimal size.

For the case of assigning to __bases__, of course, the weakrefs are not stale and we must find the correct class to remove for correctness.

I don't want guaranteed O(1) behaviour, just O(1) for the common case of creating a class, or a few classes, and then removing them.
msg189133 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 14:05
> I don't want guaranteed O(1) behaviour, just O(1) for the common case
> of creating a class, or a few classes, and then removing them.

Then just make sure you call remove_subclass() before PyObject_ClearWeakRefs()
in type_dealloc().

Really, this thing should be demonstrably correct. Performance of dynamic
class creation is not critical enough to take shortcuts.
msg189153 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-13 17:29
That is not sufficient.  The weakrefs may have been cleared already if the deletion comes as a result of garbage collection (which is currently the only way classes get deleted.)

It is still easily demonstratably correct:
The previous version _only_ removed subclasses on setting __bases__.  Housekeeping of stale weakrefs was done when new classes were created.

This proposed version still properly removes subclasses when setting __bases__, but housekeeping of dead weakrefs is now moved to the point when a class is deleted.  To do housekeeping of stale weakrefs, it is sufficient to remove _one_ stale weakref for each class that is deleted.  It is not important that this is the correct stale weakref, all stale weakrefs are the same.

I've uploaded an updated patch with added in-line comments explaining the case.
msg189155 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-13 17:42
One thing: "int i" should be "Py_ssize_t i".
msg189230 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-14 15:43
Yes, thanks for pointing that out, Antoine, I have made the change locally.
msg189950 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 10:42
Here is another approach to make tp_subclasses a dict, rather than a list. I'll let you benchmark and choose whichever you prefer :)
msg189951 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-25 10:56
Looks good, I'll give it a spin.  This is probably smarter then trying to do tricks with lists.

One question:  I saw you clearing exceptions in the tp_clear() function, isn't it better to use PyErr_PrintUnraisable()?  Is there a guideline to when to use that one in internal code?  When is it safe to just shrug and ignore?
msg189953 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 10:58
> One question:  I saw you clearing exceptions in the tp_clear()
> function, isn't it better to use PyErr_PrintUnraisable()?

You're right, that would be better. I was just being lazy.
msg189959 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 13:16
Note that making tp_subclasses a dict makes the __subclasses__ return order undefined. I don't think I've ever seen __subclasses__ actually used, so I'm not convinced it's a problem, but perhaps it's worth floating the idea on python-dev.
msg190005 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-05-25 21:10
I can't think of a use for the order of __subclasses__ so no objection here.
—
Sent from Mailbox for iPad

On Sat, May 25, 2013 at 6:16 AM, Antoine Pitrou <report@bugs.python.org>
wrote:

> Antoine Pitrou added the comment:
> Note that making tp_subclasses a dict makes the __subclasses__ return order undefined. I don't think I've ever seen __subclasses__ actually used, so I'm not convinced it's a problem, but perhaps it's worth floating the idea on python-dev.
> ----------
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17936>
> _______________________________________
msg201574 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-28 20:24
For the record, an updated patch.
msg201613 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-10-29 09:24
Just updating for recent changes, right?
LGTM.  Shouldn't this just get committed?
msg201615 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-29 09:45
I see I haven't addressed your previous comment about PyErr_Clear(), I should probably address it before anything gets committed.
msg201661 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-29 20:13
Updated patch replacing PyErr_Clear() with PyErr_WriteUnraisable(), and PyDict_Check() with PyDict_CheckExact().
msg201668 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-29 20:33
New changeset a7f1ce6fe293 by Antoine Pitrou in branch 'default':
Issue #17936: Fix O(n**2) behaviour when adding or removing many subclasses of a given type.
http://hg.python.org/cpython/rev/a7f1ce6fe293
msg201669 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-29 20:34
Actually, PyErr_Clear() was necessary in some cases. Now committed.
History
Date User Action Args
2013-10-29 20:34:40pitrousetstatus: pending -> closed
2013-10-29 20:34:35pitrousetstatus: open -> pending
resolution: fixed
messages: + msg201669

stage: patch review -> resolved
2013-10-29 20:33:55python-devsetnosy: + python-dev
messages: + msg201668
2013-10-29 20:13:33pitrousetfiles: + subclasses_dict3.patch

messages: + msg201661
2013-10-29 09:45:11pitrousetmessages: + msg201615
2013-10-29 09:24:57kristjan.jonssonsetmessages: + msg201613
2013-10-28 20:24:33pitrousetfiles: + subclasses_dict2.patch
2013-10-28 20:24:21pitrousetmessages: + msg201574
2013-05-25 21:10:53gvanrossumsetmessages: + msg190005
2013-05-25 13:16:23pitrousetmessages: + msg189959
2013-05-25 10:58:04pitrousetmessages: + msg189953
2013-05-25 10:56:11kristjan.jonssonsetmessages: + msg189951
2013-05-25 10:42:29pitrousetfiles: + subclasses_dict.patch
nosy: + gvanrossum, rhettinger
messages: + msg189950

2013-05-14 15:43:43kristjan.jonssonsetmessages: + msg189230
2013-05-13 17:42:23pitrousetmessages: + msg189155
2013-05-13 17:29:59kristjan.jonssonsetfiles: + subtype.patch

messages: + msg189153
2013-05-13 14:05:08pitrousetmessages: + msg189133
2013-05-13 11:40:09kristjan.jonssonsetmessages: + msg189124
2013-05-13 11:06:45pitrousetmessages: + msg189122
2013-05-13 10:33:19kristjan.jonssonsetmessages: + msg189121
2013-05-13 10:26:31pitrousetmessages: + msg189120
2013-05-13 10:21:23kristjan.jonssonsetmessages: + msg189119
2013-05-13 10:21:18pitrousetmessages: + msg189118
2013-05-13 10:15:29kristjan.jonssonsetmessages: + msg189117
2013-05-13 09:39:50pitrousetfiles: + subtype2.patch

messages: + msg189113
2013-05-13 09:19:07pitrousetmessages: + msg189111
2013-05-11 14:32:32kristjan.jonssonsetfiles: + subtype.patch

messages: + msg188924
2013-05-10 10:10:31pitrousetmessages: + msg188827
2013-05-10 09:53:58kristjan.jonssonsetmessages: + msg188825
2013-05-10 09:35:32pitrousetmessages: + msg188823
2013-05-10 09:14:11kristjan.jonssonsetmessages: + msg188820
2013-05-09 22:11:14hayposetnosy: + haypo
2013-05-08 16:07:32pitrousetnosy: + pitrou
title: O(2) behaviour when adding/removing classes -> O(n**2) behaviour when adding/removing classes
messages: + msg188726

versions: - Python 2.7, Python 3.3
stage: patch review
2013-05-08 15:59:59kristjan.jonssonsetfiles: + subtype.patch

messages: + msg188725
2013-05-08 15:58:46kristjan.jonssoncreate