Author larry
Recipients larry
Date 2009-04-29.02:58:53
SpamBayes Score 1.11022e-16
Marked as misclassified No
Message-id <1240973984.42.0.154845882855.issue5872@psf.upfronthosting.co.za>
In-reply-to
Content
EXECUTIVE SUMMARY

I've written a patch against py3k trunk creating a new function-based
API for creating  extension types in C.  This allows PyTypeObject to
become a (mostly) private structure.


THE PROBLEM

Here's how you create an extension type using the current API.

  * First, find some code that already has a working type declaration.
    Copy and paste their fifty-line PyTypeObject declaration, then
    hack it up until it looks like what you need.

  * Next--hey!  There *is* no next, you're done.  You can immediately
    create an object using your type and pass it into the Python
    interpreter and it would work fine.  You are encouraged to call
    PyType_Ready(), but this isn't required and it's often skipped.

This approach causes two problems.

  1) The Python interpreter *must support* and *cannot change*
     the PyTypeObject structure, forever.  Any meaningful change to
     the structure will break every extension.   This has many
     consequences:
       a) Fields that are no longer used must be left in place,
          forever, as ignored placeholders if need be.  Py3k cleaned
	  up a lot of these, but it's already picked up a new one
	  ("tp_compare" is now "tp_reserved").
       b) Internal implementation details of the type system must
          be public.
       c) The interpreter can't even use a different structure
          internally, because extensions are free to pass in objects
	  using PyTypeObjects the interpreter has never seen before.

  2) As a programming interface this lacks a certain gentility.  It
     clearly *works*, but it requires programmers to copy and paste
     with a large structure mostly containing NULLs, which they must
     pick carefully through to change just a few fields.


THE SOLUTION

My patch creates a new function-based extension type definition API.
You create a type by calling PyType_New(), then call various accessor
functions on the type (PyType_SetString and the like), and when your
type has been completely populated you must call PyType_Activate()
to enable it for use.

With this API available, extension authors no longer need to directly
see the innards of the PyTypeObject structure.  Well, most of the
fields anyway.  There are a few shortcut macros in CPython that need
to continue working for performance reasons, so the "tp_flags" and
"tp_dealloc" fields need to remain publically visible.

One feature worth mentioning is that the API is type-safe.  Many such
APIs would have had one generic "PyType_SetPointer", taking an
identifier for the field and a void * for its value, but this would
have lost type safety.  Another approach would have been to have one
accessor per field ("PyType_SetAddFunction"), but this would have
exploded the number of functions in the API.  My API splits the
difference: each distinct *type* has its own set of accessors
("PyType_GetSSizeT") which takes an identifier specifying which
field you wish to get or set.


SIDE-EFFECTS OF THE API

The major change resulting from this API: all PyTypeObjects must now
be *pointers* rather than static instances.  For example, the external
declaration of PyType_Type itself changes from this:
	PyAPI_DATA(PyTypeObject) PyType_Type;
to this:
	PyAPI_DATA(PyTypeObject *) PyType_Type;

This gives rise to the first headache caused by the API: type casts
on type objects.  It took me a day and a half to realize that this,
from Modules/_weakref.c:
        PyModule_AddObject(m, "ref",
                           (PyObject *) &_PyWeakref_RefType);
really needed to be this:
        PyModule_AddObject(m, "ref",
                           (PyObject *) _PyWeakref_RefType);

Hopefully I've already found most of these in CPython itself, but
this sort of code surely lurks in extensions yet to be touched.

(Pro-tip: if you're working with this patch, and you see a crash,
and gdb shows you something like this at the top of the stack:
	#0  0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
	               at Modules/gcmodule.c:323
	323             if (PyObject_IS_GC(op)) {
your problem is an errant &, likely on a type object you're passing
in to the interpreter.  Think--what did you touch recently?  Or debug
it by salting your code with calls to collect(NUM_GENERATIONS-1).)


Another irksome side-effect of the API: because of "tp_flags" and
"tp_dealloc", I now have two declarations of PyTypeObject.  There's
the externally-visible one in Include/object.h, which lets external
parties see "tp_dealloc" and "tp_flags".  Then there's the internal
one in Objects/typeprivate.h which is the real structure.  Since
declaring a type twice is a no-no, the external one is gated on
	#ifndef PY_TYPEPRIVATE
If you're a normal Python extension programmer, you'd include Python.h
as normal:
	#include "Python.h"
Python implementation files that need to see the real PyTypeObject
structure now look like this:
	#define PY_TYPEPRIVATE
	#include "Python.h"
	#include "../Objects/typeprivate.h"

Also, since the structure of PyTypeObject hasn't yet changed, there
are a bunch of fields in PyTypeObject that are externally visible that
I don't want to be visible.  To ensure no one was using them, I renamed
them to "mysterious_object_0" and "mysterious_object_1" and the like.
Before this patch gets accepted, I want to reorder the fields in
PyTypeObject (which we can! because it's private!) so that these public
fields are at the top of the both the external and internal structures.


THE UPGRADE PATH

Python internally declares a great many types, and I haven't attempted
to convert them all.  Instead there's an conversion header file that
does most of the work for you.  Here's how one would apply it to an
existing type.

1. Where your file currently has this:
	#include "Python.h"
   change it to this:
	#define PY_TYPEPRIVATE
	#include "Python.h"
	#include "pytypeconvert.h"

2. Whenever you declare a type, change it from this:
	static PyTypeObject YourExtension_Type = {
   to this:
	static PyTypeObject *YourExtension_Type;
	static PyTypeObject _YourExtension_Type = {

   Use NULL for your metaclass.  For example, change this:
	PyObject_HEAD_INIT(&PyType_Type),
   to this:
	PyObject_HEAD_INIT(NULL),

   Also use NULL for your baseclass.  For example, change this:
	&PyDict_Type, /* tp_base */
   to this:
	NULL, /* tp_base */
   setting it to NULL instead.

3. In your module's init function, add this:
	CONVERT_TYPE(YourExtension_Type,
	    metaclass, baseclass, "description of type");
   "metaclass" and "baseclass" should be the metaclass and baseclass
   for your type, the ones you just set to NULL in step 3.  If you
   had NULL before the baseclass, use NULL here too.

4. If you have any static object declarations, set their ob_type to
   NULL in the static declaration, then set it explicitly in your
   init function.  If your object uses a locally-defined type,
   be sure to do this *after* the CONVERT_TYPE line for that type.
   (See _Py_EllipsisObject for an example.)

5. Anywhere you're using existing Python type declarations
   you must remove the & from the front.

The conversion header file *also* redefines PyTypeObject.  But this
time it redefines it to the existing definition, and that definition
will stay the same forever.  That's the whole point: if you have an
existing Python 3.0 extension, it won't have to change if we change
the internal definition of PyTypeObject.

(Why bother with this conversion process, with few py3k extensions
in the wild?  This patch was started quite a while ago, when it
seemed plausible the API would get backported to 2.x.  Now I'm not
so sure that will happen.)




THE CURRENT PATCH

The patch below applies cleanly to py3k/trunk (r72081).  But the
code is awfully grubby.

 * I haven't dealt with any types I can't build, and I can't build
   a lot of the extensions.  I'm using Linux, and I don't have the
   dev headers for many libraries on my laptop, and I haven't touched
   Windows or Mac stuff.

 * I created some new build warnings which should obviously be fixed.

 * With the patch installed, py3k trunk builds and installs.  It does
   *not* pass the regression test suite.  (It crashes.)  I don't think
   this'll be too bad, it's just taken me this long to get it as far
   as I have.

 * There are some internal scaffolds and hacks that should be purged
   by the final patch.

 * There's no documentation.  If you'd like to see how you'd use the
   new API, currently the best way to learn is to read
   Include/pytypeconvert.h.

 * I don't like the PY_TYPEPRIVATE hack.  I only used it 'cause it
   sucks less than the other approaches I've thought of.  I welcome
   your suggestions.

   The second-best approach I've come up with: make PyTypeObject
   genuinely private, and declare a different structure containing just
   the head of PyTypeObject.   Let's call it PyTypeObjectHead.  Then,
   for the convenience macros that use "dealloc" and "flags", cast the
   object to PyTypeObjectHead before dereferencing.  This abandons type
   safety, and given my longing for type safety while developing this
   patch I'd prefer to not make loss of type safety an official API.


My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
for some recent encouragement.  (And another person who I discussed it
with so long ago I forgot who it was... maybe Fredik Lundh?)
History
Date User Action Args
2009-04-29 02:59:45larrysetrecipients: + larry
2009-04-29 02:59:44larrysetmessageid: <1240973984.42.0.154845882855.issue5872@psf.upfronthosting.co.za>
2009-04-29 02:59:41larrylinkissue5872 messages
2009-04-29 02:59:36larrycreate