classification
Title: Warn against subclassing builtins, and overriding their methods
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Kirk Hansen, cvrebert, docs@python, r.david.murray, rhettinger, steven.daprano
Priority: low Keywords:

Created on 2016-07-18 13:30 by Kirk Hansen, last changed 2019-08-22 23:39 by rhettinger. This issue is now closed.

Messages (9)
msg270754 - (view) Author: Kirk Hansen (Kirk Hansen) Date: 2016-07-18 13:30
I tried subclassing dict, and overriding its __setitem__ and __getitem__ and got some interesting results. See http://stackoverflow.com/questions/38362420/subclassing-dict-dict-update-returns-incorrrect-value-python-bug?noredirect=1#comment64142710_38362420 for an example.

I tried sub-classing UserDict.UserDict and experienced some of the same problems. Eventually, I discovered that subclassing MutableMapping was my best bet.

After an hour or two of searching and reading I discovered that CPython will not call overridden built-in methods of the same object (https://pypy.readthedocs.io/en/latest/cpython_differences.html#subclasses-of-built-in-types). This behaviour could (and did) cause some hard to track down bugs in code. 

I briefly looked at the later versions of python documentation and didn't see a mention of this (sorry if it's there), but python2.7 definitely does not mention this. In fact, python2.7 seems to __encourage__ users to subclass the built-ins (see the note https://docs.python.org/2.7/library/stdtypes.html?highlight=dict#built-in-types). Subclassing dict to __extend__ functionality is great, but there should be a big bad warning when trying to __override__ built-ins like __setitem__ and __getitem__.
msg270760 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-18 14:43
Where in the docs do you think such a note should go?  I suppose it could go in the intro to the chapter you link to.  I doubt many people will find it there, though :(
msg270783 - (view) Author: Kirk Hansen (Kirk Hansen) Date: 2016-07-18 18:18
I think it could make sense at the top of the intro. It could also make sense to have it sit https://docs.python.org/2.7/library/stdtypes.html?highlight=dict#mapping-types-dict and have the intro link to it, or vice-versa. Thoughts?
msg270789 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-18 20:15
It doesn't just apply to dicts, it applies to all the built in collection types (and possibly to other built in types).  But not to all the methods, which will make the documentation somewhat vague ("you may not be able to successfully override base class methods such as __setitem__...")  Oh, and I suppose it needs to be marked as a CPython implementation detail.
msg270790 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-07-18 21:31
This is an example of the open-closed-principle (the class is open for extension but closed for modification).  It is good thing to have because it allows subclassers to extend or override a method without unintentionally triggering behavior changes in others and without breaking the classes's invariants.

We even do this in pure python code as well; for example, inside the pure python ordered dict code, the class local call from __init__ to update() is done using name mangling.  This allows a subclasser to override update() without accidentally breaking __init__.

Sometimes, this is inconvenient.  It means that a subclasser has to override every method whose behavior they want to change including get(), update(), and others.  However, there are offsetting benefits (protection of internal invariants, preventing implementation details from leaking from the abstraction, and allowing users to assume the methods are independent of one another).

This style (chosen by Guido from the outset) is the default for the builtin types (otherwise we would forever be fighting segfaulting invariant violations) and for some pure python classes.

We do document when there is a departure from the default.  For example, the cmd module uses the framework design pattern, letting the user define various do_action() methods.  Also, some of the http modules do the same, specifically documentation that the user's do_get method gets called and that some specifically named user handler method gets called.

In the absence of specifically documented method hooks (i.e. those listed above or methods like __missing__), a subclasser should presume method independence.  Otherwise, how are you to know whether __getitem__ calls get() under the hood or vice-versa?

FWIW, this isn't unique to Python.  It comes up quite a bit in object oriented programming.  Correctly designed classes either document root methods that affect the behavior of other methods or they are presumed to be independent.

There may need to be a FAQ for this, but nothing is broken or wrong here (other than Python having way to many dict variants to chose from).  Also, R David Murray almost no one would be helped by a note in some random place in the docs.  If someone assumes or believes that __getitem__ must be called by the other accessor methods, they find out very quickly that assumption is wrong (that is if they run even minimal tests on the code).
msg270793 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-18 21:52
Raymond: yes, I was dubious about the benefit of the doc note.
msg270821 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-07-19 12:30
Raymond, that was a fantastic explanation. Would you be willing to turn it into a FAQ? Or if you don't have the time, to allow somebody to steal your description and use it?
msg270830 - (view) Author: Kirk Hansen (Kirk Hansen) Date: 2016-07-19 14:47
Raymond: Thanks for essentially answering my stackoverflow question (if you are a member, and add most of your response, I'll accept it).

I agree enough with your documentation statement. An entry in an FAQ would have been just as helpful to me as a note in the documentation, and the FAQ is probably more accessible.
msg271027 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-07-22 20:18
Will do.
History
Date User Action Args
2019-08-22 23:39:54rhettingersetstatus: open -> closed
resolution: not a bug
stage: resolved
2016-07-22 20:18:17rhettingersetmessages: + msg271027
2016-07-22 16:49:37cvrebertsetnosy: + cvrebert
2016-07-19 14:47:34Kirk Hansensetmessages: + msg270830
2016-07-19 12:30:43steven.dapranosetnosy: + steven.daprano
messages: + msg270821
2016-07-18 21:52:05r.david.murraysetmessages: + msg270793
2016-07-18 21:31:24rhettingersetpriority: normal -> low
2016-07-18 21:31:02rhettingersetassignee: docs@python -> rhettinger

messages: + msg270790
nosy: + rhettinger
2016-07-18 20:15:09r.david.murraysetmessages: + msg270789
2016-07-18 18:18:13Kirk Hansensetmessages: + msg270783
2016-07-18 14:43:41r.david.murraysetnosy: + r.david.murray

messages: + msg270760
versions: - Python 3.2, Python 3.3, Python 3.4
2016-07-18 13:30:51Kirk Hansencreate