Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New C API for declaring Python types #50122

Closed
larryhastings opened this issue Apr 29, 2009 · 7 comments
Closed

New C API for declaring Python types #50122

larryhastings opened this issue Apr 29, 2009 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@larryhastings
Copy link
Contributor

BPO 5872
Nosy @loewis, @brettcannon, @gpshead, @jcea, @abalkin, @pitrou, @larryhastings, @ericvsmith, @rnk, @davidmalcolm
Files
  • lch.type.r72081.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-03-28.19:51:18.979>
    created_at = <Date 2009-04-29.02:59:42.037>
    labels = ['interpreter-core', 'type-feature']
    title = 'New C API for declaring Python types'
    updated_at = <Date 2011-03-28.19:51:18.977>
    user = 'https://github.com/larryhastings'

    bugs.python.org fields:

    activity = <Date 2011-03-28.19:51:18.977>
    actor = 'loewis'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-03-28.19:51:18.979>
    closer = 'loewis'
    components = ['Interpreter Core']
    creation = <Date 2009-04-29.02:59:42.037>
    creator = 'larry'
    dependencies = []
    files = ['13809']
    hgrepos = []
    issue_num = 5872
    keywords = ['patch']
    message_count = 7.0
    messages = ['86775', '86809', '94657', '94687', '94688', '94689', '132415']
    nosy_count = 12.0
    nosy_names = ['loewis', 'nnorwitz', 'brett.cannon', 'gregory.p.smith', 'jcea', 'ghaering', 'belopolsky', 'pitrou', 'larry', 'eric.smith', 'rnk', 'dmalcolm']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5872'
    versions = ['Python 3.2']

    @larryhastings
    Copy link
    Contributor Author

    EXECUTIVE SUMMARY

    I've written a patch against py3k trunk creating a new function-based
    API for creating extension types in C. This allows PyTypeObject to
    become a (mostly) private structure.

    THE PROBLEM

    Here's how you create an extension type using the current API.

    • First, find some code that already has a working type declaration.
      Copy and paste their fifty-line PyTypeObject declaration, then
      hack it up until it looks like what you need.

    • Next--hey! There *is* no next, you're done. You can immediately
      create an object using your type and pass it into the Python
      interpreter and it would work fine. You are encouraged to call
      PyType_Ready(), but this isn't required and it's often skipped.

    This approach causes two problems.

    1. The Python interpreter *must support* and *cannot change*
      the PyTypeObject structure, forever. Any meaningful change to
      the structure will break every extension. This has many
      consequences:
      a) Fields that are no longer used must be left in place,
      forever, as ignored placeholders if need be. Py3k cleaned
      up a lot of these, but it's already picked up a new one
      ("tp_compare" is now "tp_reserved").
      b) Internal implementation details of the type system must
      be public.
      c) The interpreter can't even use a different structure
      internally, because extensions are free to pass in objects
      using PyTypeObjects the interpreter has never seen before.

    2. As a programming interface this lacks a certain gentility. It
      clearly *works*, but it requires programmers to copy and paste
      with a large structure mostly containing NULLs, which they must
      pick carefully through to change just a few fields.

    THE SOLUTION

    My patch creates a new function-based extension type definition API.
    You create a type by calling PyType_New(), then call various accessor
    functions on the type (PyType_SetString and the like), and when your
    type has been completely populated you must call PyType_Activate()
    to enable it for use.

    With this API available, extension authors no longer need to directly
    see the innards of the PyTypeObject structure. Well, most of the
    fields anyway. There are a few shortcut macros in CPython that need
    to continue working for performance reasons, so the "tp_flags" and
    "tp_dealloc" fields need to remain publically visible.

    One feature worth mentioning is that the API is type-safe. Many such
    APIs would have had one generic "PyType_SetPointer", taking an
    identifier for the field and a void * for its value, but this would
    have lost type safety. Another approach would have been to have one
    accessor per field ("PyType_SetAddFunction"), but this would have
    exploded the number of functions in the API. My API splits the
    difference: each distinct *type* has its own set of accessors
    ("PyType_GetSSizeT") which takes an identifier specifying which
    field you wish to get or set.

    SIDE-EFFECTS OF THE API

    The major change resulting from this API: all PyTypeObjects must now
    be *pointers* rather than static instances. For example, the external
    declaration of PyType_Type itself changes from this:
    PyAPI_DATA(PyTypeObject) PyType_Type;
    to this:
    PyAPI_DATA(PyTypeObject *) PyType_Type;

    This gives rise to the first headache caused by the API: type casts
    on type objects. It took me a day and a half to realize that this,
    from Modules/_weakref.c:
    PyModule_AddObject(m, "ref",
    (PyObject *) &_PyWeakref_RefType);
    really needed to be this:
    PyModule_AddObject(m, "ref",
    (PyObject *) _PyWeakref_RefType);

    Hopefully I've already found most of these in CPython itself, but
    this sort of code surely lurks in extensions yet to be touched.

    (Pro-tip: if you're working with this patch, and you see a crash,
    and gdb shows you something like this at the top of the stack:
    #0 0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
    at Modules/gcmodule.c:323
    323 if (PyObject_IS_GC(op)) {
    your problem is an errant &, likely on a type object you're passing
    in to the interpreter. Think--what did you touch recently? Or debug
    it by salting your code with calls to collect(NUM_GENERATIONS-1).)

    Another irksome side-effect of the API: because of "tp_flags" and
    "tp_dealloc", I now have two declarations of PyTypeObject.  There's
    the externally-visible one in Include/object.h, which lets external
    parties see "tp_dealloc" and "tp_flags".  Then there's the internal
    one in Objects/typeprivate.h which is the real structure.  Since
    declaring a type twice is a no-no, the external one is gated on
    	#ifndef PY_TYPEPRIVATE
    If you're a normal Python extension programmer, you'd include Python.h
    as normal:
    	#include "Python.h"
    Python implementation files that need to see the real PyTypeObject
    structure now look like this:
    	#define PY_TYPEPRIVATE
    	#include "Python.h"
    	#include "../Objects/typeprivate.h"
    

    Also, since the structure of PyTypeObject hasn't yet changed, there
    are a bunch of fields in PyTypeObject that are externally visible that
    I don't want to be visible. To ensure no one was using them, I renamed
    them to "mysterious_object_0" and "mysterious_object_1" and the like.
    Before this patch gets accepted, I want to reorder the fields in
    PyTypeObject (which we can! because it's private!) so that these public
    fields are at the top of the both the external and internal structures.

    THE UPGRADE PATH

    Python internally declares a great many types, and I haven't attempted
    to convert them all. Instead there's an conversion header file that
    does most of the work for you. Here's how one would apply it to an
    existing type.

    1. Where your file currently has this:
    	#include "Python.h"
       change it to this:
    	#define PY_TYPEPRIVATE
    	#include "Python.h"
    	#include "pytypeconvert.h"
    1. Whenever you declare a type, change it from this:
      static PyTypeObject YourExtension_Type = {
      to this:
      static PyTypeObject *YourExtension_Type;
      static PyTypeObject _YourExtension_Type = {

      Use NULL for your metaclass. For example, change this:
      PyObject_HEAD_INIT(&PyType_Type),
      to this:
      PyObject_HEAD_INIT(NULL),

      Also use NULL for your baseclass. For example, change this:
      &PyDict_Type, /* tp_base */
      to this:
      NULL, /* tp_base */
      setting it to NULL instead.

    2. In your module's init function, add this:
      CONVERT_TYPE(YourExtension_Type,
      metaclass, baseclass, "description of type");
      "metaclass" and "baseclass" should be the metaclass and baseclass
      for your type, the ones you just set to NULL in step 3. If you
      had NULL before the baseclass, use NULL here too.

    3. If you have any static object declarations, set their ob_type to
      NULL in the static declaration, then set it explicitly in your
      init function. If your object uses a locally-defined type,
      be sure to do this *after* the CONVERT_TYPE line for that type.
      (See _Py_EllipsisObject for an example.)

    4. Anywhere you're using existing Python type declarations
      you must remove the & from the front.

    The conversion header file *also* redefines PyTypeObject. But this
    time it redefines it to the existing definition, and that definition
    will stay the same forever. That's the whole point: if you have an
    existing Python 3.0 extension, it won't have to change if we change
    the internal definition of PyTypeObject.

    (Why bother with this conversion process, with few py3k extensions
    in the wild? This patch was started quite a while ago, when it
    seemed plausible the API would get backported to 2.x. Now I'm not
    so sure that will happen.)

    THE CURRENT PATCH

    The patch below applies cleanly to py3k/trunk (r72081). But the
    code is awfully grubby.

    • I haven't dealt with any types I can't build, and I can't build
      a lot of the extensions. I'm using Linux, and I don't have the
      dev headers for many libraries on my laptop, and I haven't touched
      Windows or Mac stuff.

    • I created some new build warnings which should obviously be fixed.

    • With the patch installed, py3k trunk builds and installs. It does
      *not* pass the regression test suite. (It crashes.) I don't think
      this'll be too bad, it's just taken me this long to get it as far
      as I have.

    • There are some internal scaffolds and hacks that should be purged
      by the final patch.

    • There's no documentation. If you'd like to see how you'd use the
      new API, currently the best way to learn is to read
      Include/pytypeconvert.h.

    • I don't like the PY_TYPEPRIVATE hack. I only used it 'cause it
      sucks less than the other approaches I've thought of. I welcome
      your suggestions.

      The second-best approach I've come up with: make PyTypeObject
      genuinely private, and declare a different structure containing just
      the head of PyTypeObject. Let's call it PyTypeObjectHead. Then,
      for the convenience macros that use "dealloc" and "flags", cast the
      object to PyTypeObjectHead before dereferencing. This abandons type
      safety, and given my longing for type safety while developing this
      patch I'd prefer to not make loss of type safety an official API.

    My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
    for some recent encouragement. (And another person who I discussed it
    with so long ago I forgot who it was... maybe Fredik Lundh?)

    @larryhastings larryhastings added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Apr 29, 2009
    @ericvsmith
    Copy link
    Member

    I think this is great stuff, Larry. It's a definite improvement.

    Unfortunately with my workload of other Python issues, I'm not going to
    be able to review this before the 3.1 beta. I don't use many
    PyTypeObject's, so my review would be mostly theoretical, anyway. But I
    do want to encourage you to get this included in Python, at least in
    concept.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 29, 2009

    This patch is huge.

    Some things:

    • you should provide an example of using the new API. Your description
      is not very, well, descriptive :)
    • "all PyTypeObjects must now be *pointers* rather than static
      instances": is it mandatory, or did you add this out of "purity"? It
      would be nice if we could minimize disruption (that is, provide the new
      API as a feature, but not force ourselves and other people to rewrite
      huge chunks of code)
    • same for the private declaration of PyTypeObject: is it really
      necessary to hide it from outside code?

    Lastly, since your patch is not ready for consumption, I would advocate
    creating a branch somewhere (on the SVN sandbox if you have access
    rights, or in a separate e.g. bitbucket repository by cloning the
    existing Mercurial mirror).

    @larryhastings
    Copy link
    Contributor Author

    Antoine: As the patch matured I would obviously provide documentation
    and examples and such. The point of submitting the patch in this form
    was a) so it was alive somewhere besides my hard drive, and b) to get
    some public review from the core team to ensure I was going in a valid
    direction.

    As for mandatory vs purity: The whole purpose of the patch was to make
    PyTypeObject a private type; see "THE PROBLEM" / "THE SOLUTION". This
    requires that all the public interfaces take pointers. So within the
    context of what the patch is trying to accomplish, it's mandatory.

    The patch attempts to mitigate this as much as possible with the
    backwards-compatibility

    p.s. By "huge" I suspect you mean "large", though on first reading I
    thought you meant "fabulous". That's what I get for working with a big
    crew of 20-somethings.

    p.p.s. Did this patch get mentioned recently or something? After months
    of inactivity, there have been two nosy+ this week.

    p.p.p.s. I have not touched this patch since submitting it. The
    tribulations of working at a startup.

    @larryhastings
    Copy link
    Contributor Author

    Whoops! I think I'll finish that unfinished sentence.

    "The patch attempts to mitigate this as much as possible with the
    backwards-compatibility"

    header file; it minimizes as much as possible the changes you need to
    perform to get your code up and working again. But I fear I've taken
    that concept about as far as it makes sense to go.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 29, 2009

    As the patch matured I would obviously provide documentation
    and examples and such.

    What I meant is that it's difficult for me (and perhaps others) to
    assess how much more practical your patch makes it to create C types.

    This
    requires that all the public interfaces take pointers. So within the
    context of what the patch is trying to accomplish, it's mandatory.

    Hmm. That public interfaces take pointers is one thing, but that doesn't
    mean the PyTypeObject structure itself must be concealed. It could be
    exposed as an implementation detail.

    Please note that your approach will make it difficult for third-party C
    extensions to remain compatible accross several Python versions (those
    before and after the API switch). While we sometimes change or deprecate
    APIs, we never do it as massively as that (even the 2.x -> 3.x
    transition is quite gentle).

    p.s. By "huge" I suspect you mean "large", though on first reading I
    thought you meant "fabulous". That's what I get for working with a big
    crew of 20-somethings.

    I meant "large" indeed :)

    p.p.s. Did this patch get mentioned recently or something? After months
    of inactivity, there have been two nosy+ this week.

    Well, a nosy+ bumps up the issue at the top of the recently modified
    issues, which means other people notice it as well.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 28, 2011

    With PEP-384, this is now obsolete, so closing it as such. If you think that selected features of this patch should still be added, please submit them as separate patches.

    @loewis loewis mannequin closed this as completed Mar 28, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants