classification
Title: New C API for declaring Python types
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, brett.cannon, dmalcolm, eric.smith, ghaering, gregory.p.smith, jcea, larry, loewis, nnorwitz, pitrou, rnk
Priority: normal Keywords: patch

Created on 2009-04-29 02:59 by larry, last changed 2011-03-28 19:51 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
lch.type.r72081.diff larry, 2009-04-29 02:58
Messages (7)
msg86775 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2009-04-29 02:58
EXECUTIVE SUMMARY

I've written a patch against py3k trunk creating a new function-based
API for creating  extension types in C.  This allows PyTypeObject to
become a (mostly) private structure.


THE PROBLEM

Here's how you create an extension type using the current API.

  * First, find some code that already has a working type declaration.
    Copy and paste their fifty-line PyTypeObject declaration, then
    hack it up until it looks like what you need.

  * Next--hey!  There *is* no next, you're done.  You can immediately
    create an object using your type and pass it into the Python
    interpreter and it would work fine.  You are encouraged to call
    PyType_Ready(), but this isn't required and it's often skipped.

This approach causes two problems.

  1) The Python interpreter *must support* and *cannot change*
     the PyTypeObject structure, forever.  Any meaningful change to
     the structure will break every extension.   This has many
     consequences:
       a) Fields that are no longer used must be left in place,
          forever, as ignored placeholders if need be.  Py3k cleaned
	  up a lot of these, but it's already picked up a new one
	  ("tp_compare" is now "tp_reserved").
       b) Internal implementation details of the type system must
          be public.
       c) The interpreter can't even use a different structure
          internally, because extensions are free to pass in objects
	  using PyTypeObjects the interpreter has never seen before.

  2) As a programming interface this lacks a certain gentility.  It
     clearly *works*, but it requires programmers to copy and paste
     with a large structure mostly containing NULLs, which they must
     pick carefully through to change just a few fields.


THE SOLUTION

My patch creates a new function-based extension type definition API.
You create a type by calling PyType_New(), then call various accessor
functions on the type (PyType_SetString and the like), and when your
type has been completely populated you must call PyType_Activate()
to enable it for use.

With this API available, extension authors no longer need to directly
see the innards of the PyTypeObject structure.  Well, most of the
fields anyway.  There are a few shortcut macros in CPython that need
to continue working for performance reasons, so the "tp_flags" and
"tp_dealloc" fields need to remain publically visible.

One feature worth mentioning is that the API is type-safe.  Many such
APIs would have had one generic "PyType_SetPointer", taking an
identifier for the field and a void * for its value, but this would
have lost type safety.  Another approach would have been to have one
accessor per field ("PyType_SetAddFunction"), but this would have
exploded the number of functions in the API.  My API splits the
difference: each distinct *type* has its own set of accessors
("PyType_GetSSizeT") which takes an identifier specifying which
field you wish to get or set.


SIDE-EFFECTS OF THE API

The major change resulting from this API: all PyTypeObjects must now
be *pointers* rather than static instances.  For example, the external
declaration of PyType_Type itself changes from this:
	PyAPI_DATA(PyTypeObject) PyType_Type;
to this:
	PyAPI_DATA(PyTypeObject *) PyType_Type;

This gives rise to the first headache caused by the API: type casts
on type objects.  It took me a day and a half to realize that this,
from Modules/_weakref.c:
        PyModule_AddObject(m, "ref",
                           (PyObject *) &_PyWeakref_RefType);
really needed to be this:
        PyModule_AddObject(m, "ref",
                           (PyObject *) _PyWeakref_RefType);

Hopefully I've already found most of these in CPython itself, but
this sort of code surely lurks in extensions yet to be touched.

(Pro-tip: if you're working with this patch, and you see a crash,
and gdb shows you something like this at the top of the stack:
	#0  0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
	               at Modules/gcmodule.c:323
	323             if (PyObject_IS_GC(op)) {
your problem is an errant &, likely on a type object you're passing
in to the interpreter.  Think--what did you touch recently?  Or debug
it by salting your code with calls to collect(NUM_GENERATIONS-1).)


Another irksome side-effect of the API: because of "tp_flags" and
"tp_dealloc", I now have two declarations of PyTypeObject.  There's
the externally-visible one in Include/object.h, which lets external
parties see "tp_dealloc" and "tp_flags".  Then there's the internal
one in Objects/typeprivate.h which is the real structure.  Since
declaring a type twice is a no-no, the external one is gated on
	#ifndef PY_TYPEPRIVATE
If you're a normal Python extension programmer, you'd include Python.h
as normal:
	#include "Python.h"
Python implementation files that need to see the real PyTypeObject
structure now look like this:
	#define PY_TYPEPRIVATE
	#include "Python.h"
	#include "../Objects/typeprivate.h"

Also, since the structure of PyTypeObject hasn't yet changed, there
are a bunch of fields in PyTypeObject that are externally visible that
I don't want to be visible.  To ensure no one was using them, I renamed
them to "mysterious_object_0" and "mysterious_object_1" and the like.
Before this patch gets accepted, I want to reorder the fields in
PyTypeObject (which we can! because it's private!) so that these public
fields are at the top of the both the external and internal structures.


THE UPGRADE PATH

Python internally declares a great many types, and I haven't attempted
to convert them all.  Instead there's an conversion header file that
does most of the work for you.  Here's how one would apply it to an
existing type.

1. Where your file currently has this:
	#include "Python.h"
   change it to this:
	#define PY_TYPEPRIVATE
	#include "Python.h"
	#include "pytypeconvert.h"

2. Whenever you declare a type, change it from this:
	static PyTypeObject YourExtension_Type = {
   to this:
	static PyTypeObject *YourExtension_Type;
	static PyTypeObject _YourExtension_Type = {

   Use NULL for your metaclass.  For example, change this:
	PyObject_HEAD_INIT(&PyType_Type),
   to this:
	PyObject_HEAD_INIT(NULL),

   Also use NULL for your baseclass.  For example, change this:
	&PyDict_Type, /* tp_base */
   to this:
	NULL, /* tp_base */
   setting it to NULL instead.

3. In your module's init function, add this:
	CONVERT_TYPE(YourExtension_Type,
	    metaclass, baseclass, "description of type");
   "metaclass" and "baseclass" should be the metaclass and baseclass
   for your type, the ones you just set to NULL in step 3.  If you
   had NULL before the baseclass, use NULL here too.

4. If you have any static object declarations, set their ob_type to
   NULL in the static declaration, then set it explicitly in your
   init function.  If your object uses a locally-defined type,
   be sure to do this *after* the CONVERT_TYPE line for that type.
   (See _Py_EllipsisObject for an example.)

5. Anywhere you're using existing Python type declarations
   you must remove the & from the front.

The conversion header file *also* redefines PyTypeObject.  But this
time it redefines it to the existing definition, and that definition
will stay the same forever.  That's the whole point: if you have an
existing Python 3.0 extension, it won't have to change if we change
the internal definition of PyTypeObject.

(Why bother with this conversion process, with few py3k extensions
in the wild?  This patch was started quite a while ago, when it
seemed plausible the API would get backported to 2.x.  Now I'm not
so sure that will happen.)




THE CURRENT PATCH

The patch below applies cleanly to py3k/trunk (r72081).  But the
code is awfully grubby.

 * I haven't dealt with any types I can't build, and I can't build
   a lot of the extensions.  I'm using Linux, and I don't have the
   dev headers for many libraries on my laptop, and I haven't touched
   Windows or Mac stuff.

 * I created some new build warnings which should obviously be fixed.

 * With the patch installed, py3k trunk builds and installs.  It does
   *not* pass the regression test suite.  (It crashes.)  I don't think
   this'll be too bad, it's just taken me this long to get it as far
   as I have.

 * There are some internal scaffolds and hacks that should be purged
   by the final patch.

 * There's no documentation.  If you'd like to see how you'd use the
   new API, currently the best way to learn is to read
   Include/pytypeconvert.h.

 * I don't like the PY_TYPEPRIVATE hack.  I only used it 'cause it
   sucks less than the other approaches I've thought of.  I welcome
   your suggestions.

   The second-best approach I've come up with: make PyTypeObject
   genuinely private, and declare a different structure containing just
   the head of PyTypeObject.   Let's call it PyTypeObjectHead.  Then,
   for the convenience macros that use "dealloc" and "flags", cast the
   object to PyTypeObjectHead before dereferencing.  This abandons type
   safety, and given my longing for type safety while developing this
   patch I'd prefer to not make loss of type safety an official API.


My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
for some recent encouragement.  (And another person who I discussed it
with so long ago I forgot who it was... maybe Fredik Lundh?)
msg86809 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-04-29 15:34
I think this is great stuff, Larry. It's a definite improvement.

Unfortunately with my workload of other Python issues, I'm not going to
be able to review this before the 3.1 beta. I don't use many
PyTypeObject's, so my review would be mostly theoretical, anyway. But I
do want to encourage you to get this included in Python, at least in
concept.
msg94657 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-29 11:34
This patch is huge.

Some things:
- you should provide an example of using the new API. Your description
is not very, well, descriptive :)
- "all PyTypeObjects must now be *pointers* rather than static
instances": is it mandatory, or did you add this out of "purity"? It
would be nice if we could minimize disruption (that is, provide the new
API as a feature, but not force ourselves and other people to rewrite
huge chunks of code)
- same for the private declaration of PyTypeObject: is it really
necessary to hide it from outside code?

Lastly, since your patch is not ready for consumption, I would advocate
creating a branch somewhere (on the SVN sandbox if you have access
rights, or in a separate e.g. bitbucket repository by cloning the
existing Mercurial mirror).
msg94687 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2009-10-29 22:55
Antoine: As the patch matured I would obviously provide documentation
and examples and such.  The point of submitting the patch in this form
was a) so it was alive somewhere besides my hard drive, and b) to get
some public review from the core team to ensure I was going in a valid
direction.

As for mandatory vs purity: The whole purpose of the patch was to make
PyTypeObject a private type; see "THE PROBLEM" / "THE SOLUTION".  This
requires that all the public interfaces take pointers.  So within the
context of what the patch is trying to accomplish, it's mandatory.

The patch attempts to mitigate this as much as possible with the
backwards-compatibility 

p.s. By "huge" I suspect you mean "large", though on first reading I
thought you meant "fabulous".  That's what I get for working with a big
crew of 20-somethings.

p.p.s. Did this patch get mentioned recently or something?  After months
of inactivity, there have been two nosy+ this week.

p.p.p.s. I have not touched this patch since submitting it.  The
tribulations of working at a startup.
msg94688 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2009-10-29 22:57
Whoops!  I think I'll finish that unfinished sentence.

"The patch attempts to mitigate this as much as possible with the
backwards-compatibility" 

header file; it minimizes as much as possible the changes you need to
perform to get your code up and working again.  But I fear I've taken
that concept about as far as it makes sense to go.
msg94689 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-29 23:11
> As the patch matured I would obviously provide documentation
> and examples and such.

What I meant is that it's difficult for me (and perhaps others) to
assess how much more practical your patch makes it to create C types.

> This
> requires that all the public interfaces take pointers.  So within the
> context of what the patch is trying to accomplish, it's mandatory.

Hmm. That public interfaces take pointers is one thing, but that doesn't
mean the PyTypeObject structure itself must be concealed. It could be
exposed as an implementation detail.

Please note that your approach will make it difficult for third-party C
extensions to remain compatible accross several Python versions (those
before and after the API switch). While we sometimes change or deprecate
APIs, we never do it as massively as that (even the 2.x -> 3.x
transition is quite gentle).

> p.s. By "huge" I suspect you mean "large", though on first reading I
> thought you meant "fabulous".  That's what I get for working with a big
> crew of 20-somethings.

I meant "large" indeed :)

> p.p.s. Did this patch get mentioned recently or something?  After months
> of inactivity, there have been two nosy+ this week.

Well, a nosy+ bumps up the issue at the top of the recently modified
issues, which means other people notice it as well.
msg132415 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-28 19:51
With PEP 384, this is now obsolete, so closing it as such. If you think that selected features of this patch should still be added, please submit them as separate patches.
History
Date User Action Args
2011-03-28 19:51:18loewissetstatus: open -> closed

nosy: + loewis
messages: + msg132415

resolution: out of date
2010-07-18 17:40:59rnksetnosy: + rnk

versions: + Python 3.2, - Python 3.1
2010-04-25 03:29:33jceasetnosy: + jcea
2010-03-25 14:00:59ghaeringsetnosy: + ghaering
2010-02-24 18:57:42dmalcolmsetnosy: + dmalcolm
2009-10-29 23:11:41pitrousetmessages: + msg94689
2009-10-29 22:57:08larrysetmessages: + msg94688
2009-10-29 22:55:09larrysetmessages: + msg94687
2009-10-29 11:34:07pitrousetnosy: + pitrou, nnorwitz
messages: + msg94657
2009-10-28 02:12:25belopolskysetnosy: + belopolsky
2009-04-30 22:44:33gregory.p.smithsetnosy: + gregory.p.smith
2009-04-29 15:34:28eric.smithsetmessages: + msg86809
2009-04-29 08:03:11eric.smithsetnosy: + eric.smith
2009-04-29 03:12:47brett.cannonsetnosy: + brett.cannon
2009-04-29 02:59:42larrycreate