Issue5872
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-04-29 02:59 by larry, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
lch.type.r72081.diff | larry, 2009-04-29 02:58 |
Messages (7) | |||
---|---|---|---|
msg86775 - (view) | Author: Larry Hastings (larry) * | Date: 2009-04-29 02:58 | |
EXECUTIVE SUMMARY I've written a patch against py3k trunk creating a new function-based API for creating extension types in C. This allows PyTypeObject to become a (mostly) private structure. THE PROBLEM Here's how you create an extension type using the current API. * First, find some code that already has a working type declaration. Copy and paste their fifty-line PyTypeObject declaration, then hack it up until it looks like what you need. * Next--hey! There *is* no next, you're done. You can immediately create an object using your type and pass it into the Python interpreter and it would work fine. You are encouraged to call PyType_Ready(), but this isn't required and it's often skipped. This approach causes two problems. 1) The Python interpreter *must support* and *cannot change* the PyTypeObject structure, forever. Any meaningful change to the structure will break every extension. This has many consequences: a) Fields that are no longer used must be left in place, forever, as ignored placeholders if need be. Py3k cleaned up a lot of these, but it's already picked up a new one ("tp_compare" is now "tp_reserved"). b) Internal implementation details of the type system must be public. c) The interpreter can't even use a different structure internally, because extensions are free to pass in objects using PyTypeObjects the interpreter has never seen before. 2) As a programming interface this lacks a certain gentility. It clearly *works*, but it requires programmers to copy and paste with a large structure mostly containing NULLs, which they must pick carefully through to change just a few fields. THE SOLUTION My patch creates a new function-based extension type definition API. You create a type by calling PyType_New(), then call various accessor functions on the type (PyType_SetString and the like), and when your type has been completely populated you must call PyType_Activate() to enable it for use. With this API available, extension authors no longer need to directly see the innards of the PyTypeObject structure. Well, most of the fields anyway. There are a few shortcut macros in CPython that need to continue working for performance reasons, so the "tp_flags" and "tp_dealloc" fields need to remain publically visible. One feature worth mentioning is that the API is type-safe. Many such APIs would have had one generic "PyType_SetPointer", taking an identifier for the field and a void * for its value, but this would have lost type safety. Another approach would have been to have one accessor per field ("PyType_SetAddFunction"), but this would have exploded the number of functions in the API. My API splits the difference: each distinct *type* has its own set of accessors ("PyType_GetSSizeT") which takes an identifier specifying which field you wish to get or set. SIDE-EFFECTS OF THE API The major change resulting from this API: all PyTypeObjects must now be *pointers* rather than static instances. For example, the external declaration of PyType_Type itself changes from this: PyAPI_DATA(PyTypeObject) PyType_Type; to this: PyAPI_DATA(PyTypeObject *) PyType_Type; This gives rise to the first headache caused by the API: type casts on type objects. It took me a day and a half to realize that this, from Modules/_weakref.c: PyModule_AddObject(m, "ref", (PyObject *) &_PyWeakref_RefType); really needed to be this: PyModule_AddObject(m, "ref", (PyObject *) _PyWeakref_RefType); Hopefully I've already found most of these in CPython itself, but this sort of code surely lurks in extensions yet to be touched. (Pro-tip: if you're working with this patch, and you see a crash, and gdb shows you something like this at the top of the stack: #0 0x081056d8 in visit_decref (op=0x8247aa0, data=0x0) at Modules/gcmodule.c:323 323 if (PyObject_IS_GC(op)) { your problem is an errant &, likely on a type object you're passing in to the interpreter. Think--what did you touch recently? Or debug it by salting your code with calls to collect(NUM_GENERATIONS-1).) Another irksome side-effect of the API: because of "tp_flags" and "tp_dealloc", I now have two declarations of PyTypeObject. There's the externally-visible one in Include/object.h, which lets external parties see "tp_dealloc" and "tp_flags". Then there's the internal one in Objects/typeprivate.h which is the real structure. Since declaring a type twice is a no-no, the external one is gated on #ifndef PY_TYPEPRIVATE If you're a normal Python extension programmer, you'd include Python.h as normal: #include "Python.h" Python implementation files that need to see the real PyTypeObject structure now look like this: #define PY_TYPEPRIVATE #include "Python.h" #include "../Objects/typeprivate.h" Also, since the structure of PyTypeObject hasn't yet changed, there are a bunch of fields in PyTypeObject that are externally visible that I don't want to be visible. To ensure no one was using them, I renamed them to "mysterious_object_0" and "mysterious_object_1" and the like. Before this patch gets accepted, I want to reorder the fields in PyTypeObject (which we can! because it's private!) so that these public fields are at the top of the both the external and internal structures. THE UPGRADE PATH Python internally declares a great many types, and I haven't attempted to convert them all. Instead there's an conversion header file that does most of the work for you. Here's how one would apply it to an existing type. 1. Where your file currently has this: #include "Python.h" change it to this: #define PY_TYPEPRIVATE #include "Python.h" #include "pytypeconvert.h" 2. Whenever you declare a type, change it from this: static PyTypeObject YourExtension_Type = { to this: static PyTypeObject *YourExtension_Type; static PyTypeObject _YourExtension_Type = { Use NULL for your metaclass. For example, change this: PyObject_HEAD_INIT(&PyType_Type), to this: PyObject_HEAD_INIT(NULL), Also use NULL for your baseclass. For example, change this: &PyDict_Type, /* tp_base */ to this: NULL, /* tp_base */ setting it to NULL instead. 3. In your module's init function, add this: CONVERT_TYPE(YourExtension_Type, metaclass, baseclass, "description of type"); "metaclass" and "baseclass" should be the metaclass and baseclass for your type, the ones you just set to NULL in step 3. If you had NULL before the baseclass, use NULL here too. 4. If you have any static object declarations, set their ob_type to NULL in the static declaration, then set it explicitly in your init function. If your object uses a locally-defined type, be sure to do this *after* the CONVERT_TYPE line for that type. (See _Py_EllipsisObject for an example.) 5. Anywhere you're using existing Python type declarations you must remove the & from the front. The conversion header file *also* redefines PyTypeObject. But this time it redefines it to the existing definition, and that definition will stay the same forever. That's the whole point: if you have an existing Python 3.0 extension, it won't have to change if we change the internal definition of PyTypeObject. (Why bother with this conversion process, with few py3k extensions in the wild? This patch was started quite a while ago, when it seemed plausible the API would get backported to 2.x. Now I'm not so sure that will happen.) THE CURRENT PATCH The patch below applies cleanly to py3k/trunk (r72081). But the code is awfully grubby. * I haven't dealt with any types I can't build, and I can't build a lot of the extensions. I'm using Linux, and I don't have the dev headers for many libraries on my laptop, and I haven't touched Windows or Mac stuff. * I created some new build warnings which should obviously be fixed. * With the patch installed, py3k trunk builds and installs. It does *not* pass the regression test suite. (It crashes.) I don't think this'll be too bad, it's just taken me this long to get it as far as I have. * There are some internal scaffolds and hacks that should be purged by the final patch. * There's no documentation. If you'd like to see how you'd use the new API, currently the best way to learn is to read Include/pytypeconvert.h. * I don't like the PY_TYPEPRIVATE hack. I only used it 'cause it sucks less than the other approaches I've thought of. I welcome your suggestions. The second-best approach I've come up with: make PyTypeObject genuinely private, and declare a different structure containing just the head of PyTypeObject. Let's call it PyTypeObjectHead. Then, for the convenience macros that use "dealloc" and "flags", cast the object to PyTypeObjectHead before dereferencing. This abandons type safety, and given my longing for type safety while developing this patch I'd prefer to not make loss of type safety an official API. My thanks to Neal Norwitz for suggesting this project, and Brett Cannon for some recent encouragement. (And another person who I discussed it with so long ago I forgot who it was... maybe Fredik Lundh?) |
|||
msg86809 - (view) | Author: Eric V. Smith (eric.smith) * | Date: 2009-04-29 15:34 | |
I think this is great stuff, Larry. It's a definite improvement. Unfortunately with my workload of other Python issues, I'm not going to be able to review this before the 3.1 beta. I don't use many PyTypeObject's, so my review would be mostly theoretical, anyway. But I do want to encourage you to get this included in Python, at least in concept. |
|||
msg94657 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-10-29 11:34 | |
This patch is huge. Some things: - you should provide an example of using the new API. Your description is not very, well, descriptive :) - "all PyTypeObjects must now be *pointers* rather than static instances": is it mandatory, or did you add this out of "purity"? It would be nice if we could minimize disruption (that is, provide the new API as a feature, but not force ourselves and other people to rewrite huge chunks of code) - same for the private declaration of PyTypeObject: is it really necessary to hide it from outside code? Lastly, since your patch is not ready for consumption, I would advocate creating a branch somewhere (on the SVN sandbox if you have access rights, or in a separate e.g. bitbucket repository by cloning the existing Mercurial mirror). |
|||
msg94687 - (view) | Author: Larry Hastings (larry) * | Date: 2009-10-29 22:55 | |
Antoine: As the patch matured I would obviously provide documentation and examples and such. The point of submitting the patch in this form was a) so it was alive somewhere besides my hard drive, and b) to get some public review from the core team to ensure I was going in a valid direction. As for mandatory vs purity: The whole purpose of the patch was to make PyTypeObject a private type; see "THE PROBLEM" / "THE SOLUTION". This requires that all the public interfaces take pointers. So within the context of what the patch is trying to accomplish, it's mandatory. The patch attempts to mitigate this as much as possible with the backwards-compatibility p.s. By "huge" I suspect you mean "large", though on first reading I thought you meant "fabulous". That's what I get for working with a big crew of 20-somethings. p.p.s. Did this patch get mentioned recently or something? After months of inactivity, there have been two nosy+ this week. p.p.p.s. I have not touched this patch since submitting it. The tribulations of working at a startup. |
|||
msg94688 - (view) | Author: Larry Hastings (larry) * | Date: 2009-10-29 22:57 | |
Whoops! I think I'll finish that unfinished sentence. "The patch attempts to mitigate this as much as possible with the backwards-compatibility" header file; it minimizes as much as possible the changes you need to perform to get your code up and working again. But I fear I've taken that concept about as far as it makes sense to go. |
|||
msg94689 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2009-10-29 23:11 | |
> As the patch matured I would obviously provide documentation > and examples and such. What I meant is that it's difficult for me (and perhaps others) to assess how much more practical your patch makes it to create C types. > This > requires that all the public interfaces take pointers. So within the > context of what the patch is trying to accomplish, it's mandatory. Hmm. That public interfaces take pointers is one thing, but that doesn't mean the PyTypeObject structure itself must be concealed. It could be exposed as an implementation detail. Please note that your approach will make it difficult for third-party C extensions to remain compatible accross several Python versions (those before and after the API switch). While we sometimes change or deprecate APIs, we never do it as massively as that (even the 2.x -> 3.x transition is quite gentle). > p.s. By "huge" I suspect you mean "large", though on first reading I > thought you meant "fabulous". That's what I get for working with a big > crew of 20-somethings. I meant "large" indeed :) > p.p.s. Did this patch get mentioned recently or something? After months > of inactivity, there have been two nosy+ this week. Well, a nosy+ bumps up the issue at the top of the recently modified issues, which means other people notice it as well. |
|||
msg132415 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2011-03-28 19:51 | |
With PEP 384, this is now obsolete, so closing it as such. If you think that selected features of this patch should still be added, please submit them as separate patches. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:48 | admin | set | github: 50122 |
2011-03-28 19:51:18 | loewis | set | status: open -> closed nosy: + loewis messages: + msg132415 resolution: out of date |
2010-07-18 17:40:59 | rnk | set | nosy:
+ rnk versions: + Python 3.2, - Python 3.1 |
2010-04-25 03:29:33 | jcea | set | nosy:
+ jcea |
2010-03-25 14:00:59 | ghaering | set | nosy:
+ ghaering |
2010-02-24 18:57:42 | dmalcolm | set | nosy:
+ dmalcolm |
2009-10-29 23:11:41 | pitrou | set | messages: + msg94689 |
2009-10-29 22:57:08 | larry | set | messages: + msg94688 |
2009-10-29 22:55:09 | larry | set | messages: + msg94687 |
2009-10-29 11:34:07 | pitrou | set | nosy:
+ pitrou, nnorwitz messages: + msg94657 |
2009-10-28 02:12:25 | belopolsky | set | nosy:
+ belopolsky |
2009-04-30 22:44:33 | gregory.p.smith | set | nosy:
+ gregory.p.smith |
2009-04-29 15:34:28 | eric.smith | set | messages: + msg86809 |
2009-04-29 08:03:11 | eric.smith | set | nosy:
+ eric.smith |
2009-04-29 03:12:47 | brett.cannon | set | nosy:
+ brett.cannon |
2009-04-29 02:59:42 | larry | create |