classification
Title: Special method lookup fails on uninitialized types
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Oren Milman, eric.smith, gvanrossum, python-dev, serhiy.storchaka, terry.reedy, ztane
Priority: high Keywords:

Created on 2016-05-02 13:03 by ztane, last changed 2017-04-01 05:48 by serhiy.storchaka.

Files
File name Uploaded Description Edit
init_method_descr_types.patch serhiy.storchaka, 2016-05-02 14:46 review
init_types-2.7.patch serhiy.storchaka, 2016-05-04 13:49 review
init_type_in_pytype_lookup.patch serhiy.storchaka, 2016-05-04 17:33 review
Messages (25)
msg264647 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-02 13:07
This is an annoying heisenbug; it seems that some objects cannot be formatted until you explicitly do obj.__format__. For example `object.__reduce__` behaves like this:

    Python 2.7.10 (default, Oct 14 2015, 16:09:02)
    [GCC 5.2.1 20151010] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> format(object.__reduce__)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Type method_descriptor doesn't define __format__
    >>> format(object.__reduce__)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Type method_descriptor doesn't define __format__
    >>> object.__reduce__.__format__
    <built-in method __format__ of method_descriptor object at 0x7f67563ed0e0>
    >>> format(object.__reduce__)
    "<method '__reduce__' of 'object' objects>"

I can replicate this in 2.7.9, .10 and .11 on Ubuntu and Debian, though it works on Windows Python, works in 2.6.6, and Pythons 3 wherever I've tried, but I've heard this also failing on Python 3.
msg264648 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-02 13:09
s/explicitly do/explicitly access/
msg264651 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-02 14:46
Proposed patch makes method descriptors types to be explicitly initialized as in 3.x.
msg264730 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-03 14:15
There is similar issue on 3.x:

>>> import array
>>> it = iter(array.array('i'))
>>> format(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type arrayiterator doesn't define __format__
>>> type(it).__format__
<method '__format__' of 'object' objects>
>>> format(it)
'<arrayiterator object at 0xb703f4ec>'
msg264816 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-04 13:49
A number of other types are not initialized until you request an attribute.

Here is larger patch for 2.7 that makes 38 types to be explicitly initialized.
msg264828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-04 15:46
An alternative way is just call PyType_Ready from _PyType_Lookup if type->tp_mro is NULL.

Here is a patch against 2.7 that restores the solution from issue551412, but returns NULL if type->tp_mro is still NULL after calling PyType_Ready. I found one place in tests when this is happened (CIOTest.test_IOBase_finalize in test_io).
msg264840 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-04 16:28
Serhiy, I'm happy to help, but I'm not sure what you're asking me to do. Decide between different patches? I can't even repro the issue.
msg264843 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-04 17:33
Added a check for Py_TPFLAGS_READYING to prevent recursive calling.
msg264844 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-04 17:57
The problem is that format() fails for instances of some classes, because the type still is not initialized. The simplest example -- list iterator.

>>> format(iter([]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type listiterator doesn't define __format__

After forcing type initialization (for example by getting any type's attribute), format() becomes working.

>>> type(iter([])).foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'listiterator' has no attribute 'foo'
>>> format(iter([]))
'<listiterator object at 0xb708d0ec>'

I afraid that format() is just one example, and there are other functions or operators that don't work or work incorrectly if the type was not initialized.

init_types-2.7.patch adds explicit initialization of 38 types (I didn't check that all of them need this, but I suppose they do). This is large patch, and I'm not sure that it fixes all types.

Other way is to try to initialize the type just in _PyType_Lookup if it is not initialized. This is simple change, but a comment in _PyType_Lookup warns me. I found that this solution was already applied as an attempt to fix issue551412, but then reverted. Since you seem to be the most knowledgeable with this code, I'm asking you what was wrong with this approach and how we can fix this.

Python 3.x also suffers from this bug, but it is reproduced with less types. For example it isn't reproduced for list iterator. I don't know why.
msg264912 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-05 18:44
I can reproduce the bug in 3.5.0+ Ubuntu with list iterator, if I execute python with -S:

    % python3.5 -S
    Python 3.5.0+ (default, Oct 11 2015, 09:05:38) 
    [GCC 5.2.1 20151010] on linux
    >>> format(iter([]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Type list_iterator doesn't define __format__

Thus here it depends on the stuff that site does or doesn't do. Iterating over a list iterator does *not* trigger the initialization. Printing it doesn't help either, or anything else that does not touch the non-magic attributes. I am not even sure what the site.py and such are doing to the list iterator class to trigger the initialization.
msg264920 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-05 20:10
And to the other things failing, I was trying to find out which of the magic method ops fail, and for that tried to find out the `dir` of list iterator. Well...

    % python3.5 -S          
    Python 3.5.0+ (default, Oct 11 2015, 09:05:38) 
    [GCC 5.2.1 20151010] on linux
    >>> dir(iter([]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object does not provide __dir__
msg264990 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-06 16:38
Sadly it's been a very long time since I wrote that code and I don't recall
much about it. I presume there was a good reason for not to do it in
_PyType_Lookup(), but who knows -- maybe the oroginal approach was just too
naive and nobody cared? I'm not excited by a patch that does this for 38
types -- invariably there will be another type that still surfaces the same
bug.
msg265009 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-05-06 19:23
Is there a way to have format() try to force the initialization, by explicitly doing the equivalent of obj.__format__, at least for types, instead of raising the TypeError?
msg265010 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-06 19:30
But the problem isn't limited to format()... Why would format() be special?
msg265018 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-06 20:09
There is one test (ClassPropertiesAndMethods.test_mutable_bases_with_failing_mro in test_descr) that crashes with the code from issue551412 because _PyType_Lookup() is recursive called from PyType_Ready(). Is this the reason? My patch prevents recursive calls.

Here is minimal example (for Python 3):

class M(type):
    def mro(self):
        hasattr(self, 'foo')
        return type.mro(self)

class C(metaclass=M):
    pass

When class C is created, C.mro() is called while C still is not ready. Resolving an attribute calls _PyType_Lookup() which calls PyType_Ready() which calls mro() etc.
msg265023 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-06 20:25
Probably.
msg265058 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-07 11:14
I am not an expert on PyType internals, so I am wondering why is the PyType_Ready'ing done implicitly at all?
msg265073 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-07 14:51
Because the data structure that defines a type is just data, and at some
point PyType_Ready() must be called. The question is how to do this, given
that nobody can (or needs to) produce a definitive list of all types.
msg265081 - (view) Author: Antti Haapala (ztane) * Date: 2016-05-07 17:10
Could it be possible to to make the debug build absolutely abort on any usage of PyType's that are not readied, usage including instantiating them. Then, instead of changing all `static` linkages to `extern`s (as in Serhiy's first patch) one could rather make per-compilation unit initialization functions that are called from objects.c; that way it would be easier to use preprocessor to turn on and off the very existence of certain types in a compilation unit based on a preprocessor flag.

Likewise the C-API docs for PyType_Ready should perhaps say "This must be called on all type objects to finish their initialization." instead of "should"
msg277875 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-02 09:54
Similar bug just was introduced in issue21124.
msg277912 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-02 21:10
Yet one similar bug: issue11702.
msg277913 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-10-02 22:05
Serhiy -- please do what do you think we should do. At this point I'm open to just about anything, but I don't feel comfortable creating or reviewing patches any more.
msg278247 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-07 15:04
Yet one demonstration of this bug:

$ ./python -IS
>>> import operator
>>> operator.length_hint(iter("abc"))
0
>>> import collections.abc
>>> operator.length_hint(iter("abc"))
3
msg278287 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-08 09:28
New changeset bbaf6c928526 by Serhiy Storchaka in branch '3.5':
Issue #26906: Resolving special methods of uninitialized type now causes
https://hg.python.org/cpython/rev/bbaf6c928526

New changeset 3119f08802a5 by Serhiy Storchaka in branch '2.7':
Issue #26906: Resolving special methods of uninitialized type now causes
https://hg.python.org/cpython/rev/3119f08802a5

New changeset 888a26fac9d2 by Serhiy Storchaka in branch '3.6':
Issue #26906: Resolving special methods of uninitialized type now causes
https://hg.python.org/cpython/rev/888a26fac9d2

New changeset d24f1467a297 by Serhiy Storchaka in branch 'default':
Issue #26906: Resolving special methods of uninitialized type now causes
https://hg.python.org/cpython/rev/d24f1467a297
msg278290 - (view) Author: Oren Milman (Oren Milman) * Date: 2016-10-08 10:55
(Just to save time for anyone interested)
The last demonstration of the bug Serhiy mentioned is caused by the following (this was right only until Serhiy's patch earlier today):
    - before importing collections.abc, str_iterator is not initialized, which means:
        * Its tp_mro is NULL.
        * _PyType_Lookup returns NULL (when called to lookup __length_hint__ in str_iterator (as part of operator.length_hint))
    - on import, collections.abc also does 'Iterator.register(str_iterator)', which leads to the following call chain: ABCMeta.register(Iterator, str_iterator) => issubclass(str_iterator, Iterator) => PyObject_IsSubclass(str_iterator, Iterator) => Iterator.__subclasscheck__(Iterator, str_iterator) => Iterator.__subclasshook__(str_iterator) => collections.abc._check_methods(str_iterator, '__iter__', '__next__')
    And _check_methods first does 'mro = C.__mro__', which ultimately calls type_getattro (which calls PyType_Ready in case tp_dict is NULL).


Anyway, with regard to the disconcerting comment:
    /* If mro is NULL, the type is either not yet initialized
       by PyType_Ready(), or already cleared by type_clear().
       Either way the safest thing to do is to return NULL. */
Sorry for the newbie question, but why not add a Py_TPFLAGS_CLEARED flag to tp_flags?
Then we could assert in _PyType_Lookup (and maybe also in other places that call PyType_Ready, such as type_getattro) that the Py_TPFLAGS_CLEARED is not set..

I realize adding such a flag is really a big deal, but maybe it's worth catching sneaky bugs caused by Python's equivalent of Use-After-Free bugs?
History
Date User Action Args
2017-04-01 05:48:04serhiy.storchakasetpull_requests: - pull_request967
2017-03-31 16:36:23dstufftsetpull_requests: + pull_request967
2016-10-08 14:43:42serhiy.storchakasettitle: Special method lookup fails on unitialized types -> Special method lookup fails on uninitialized types
2016-10-08 10:55:55Oren Milmansetnosy: + Oren Milman
messages: + msg278290
2016-10-08 09:28:54python-devsetnosy: + python-dev
messages: + msg278287
2016-10-08 09:27:56serhiy.storchakasettitle: format(object.__reduce__) fails intermittently -> Special method lookup fails on unitialized types
2016-10-07 15:04:43serhiy.storchakasetmessages: + msg278247
2016-10-02 22:05:25gvanrossumsetassignee: gvanrossum -> serhiy.storchaka
messages: + msg277913
2016-10-02 21:10:38serhiy.storchakasetmessages: + msg277912
2016-10-02 09:54:35serhiy.storchakasetpriority: normal -> high

messages: + msg277875
versions: + Python 3.7
2016-05-07 17:10:24ztanesetmessages: + msg265081
2016-05-07 14:51:26gvanrossumsetmessages: + msg265073
2016-05-07 11:14:09ztanesetmessages: + msg265058
2016-05-06 20:25:34gvanrossumsetmessages: + msg265023
2016-05-06 20:09:08serhiy.storchakasetkeywords: - patch

messages: + msg265018
2016-05-06 19:30:04gvanrossumsetmessages: + msg265010
2016-05-06 19:23:32terry.reedysetnosy: + terry.reedy
messages: + msg265009
2016-05-06 16:38:54gvanrossumsetmessages: + msg264990
2016-05-05 20:10:12ztanesetmessages: + msg264920
2016-05-05 18:44:42ztanesetmessages: + msg264912
2016-05-04 17:57:16serhiy.storchakasetmessages: + msg264844
2016-05-04 17:33:04serhiy.storchakasetfiles: + init_type_in_pytype_lookup.patch

messages: + msg264843
2016-05-04 17:32:04serhiy.storchakasetfiles: - init_type_in_pytype_lookup.patch
2016-05-04 16:28:52gvanrossumsetmessages: + msg264840
2016-05-04 15:46:38serhiy.storchakasetfiles: + init_type_in_pytype_lookup.patch

nosy: + gvanrossum
messages: + msg264828

assignee: gvanrossum
2016-05-04 13:49:06serhiy.storchakasetfiles: + init_types-2.7.patch

messages: + msg264816
2016-05-03 14:15:53serhiy.storchakasetmessages: + msg264730
versions: + Python 3.5, Python 3.6
2016-05-02 14:46:31serhiy.storchakasetfiles: + init_method_descr_types.patch

components: + Interpreter Core

keywords: + patch
nosy: + serhiy.storchaka
messages: + msg264651
stage: patch review
2016-05-02 13:17:30eric.smithsetnosy: + eric.smith
type: behavior
2016-05-02 13:09:03ztanesetmessages: + msg264648
2016-05-02 13:07:24ztanesetmessages: + msg264647
title: __reduce__ format -> format(object.__reduce__) fails intermittently
2016-05-02 13:03:48ztanecreate