Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate PC/python3.def by scraping headers #68091

Closed
zware opened this issue Apr 10, 2015 · 36 comments
Closed

Generate PC/python3.def by scraping headers #68091

zware opened this issue Apr 10, 2015 · 36 comments
Labels
3.8 only security fixes 3.9 only security fixes build The build process and cross-build OS-windows topic-C-API type-feature A feature request or enhancement

Comments

@zware
Copy link
Member

zware commented Apr 10, 2015

BPO 23903
Nosy @loewis, @nascheme, @pfmoore, @tjguk, @encukou, @zware, @serhiy-storchaka, @zooba
Dependencies
  • bpo-29058: Mark new limited C API
  • bpo-29083: Readd PyArg_VaParse to the stable API
  • Files
  • add_python3defgen.diff
  • 23903_with_python3.def.diff
  • python3def.patch
  • python3def.patch: Regenerated for review
  • genpython3def.sh
  • python3def-3.6.patch
  • 23903_1.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-01.14:44:50.970>
    created_at = <Date 2015-04-10.03:49:20.959>
    labels = ['build', '3.8', '3.9', 'expert-C-API', 'type-feature', 'OS-windows']
    title = 'Generate PC/python3.def by scraping headers'
    updated_at = <Date 2021-06-01.14:44:50.969>
    user = 'https://github.com/zware'

    bugs.python.org fields:

    activity = <Date 2021-06-01.14:44:50.969>
    actor = 'petr.viktorin'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-01.14:44:50.970>
    closer = 'petr.viktorin'
    components = ['Build', 'Windows', 'C API']
    creation = <Date 2015-04-10.03:49:20.959>
    creator = 'zach.ware'
    dependencies = ['29058', '29083']
    files = ['38886', '38967', '45978', '46011', '46026', '46027', '46047']
    hgrepos = []
    issue_num = 23903
    keywords = ['patch']
    message_count = 36.0
    messages = ['240409', '240754', '241262', '241281', '243994', '243997', '244309', '264614', '264619', '264624', '264626', '264652', '283715', '283770', '283784', '283791', '283872', '283891', '283901', '283902', '283907', '283911', '283972', '283973', '284039', '284043', '284047', '284077', '284079', '284083', '284105', '284108', '355732', '387118', '393078', '394859']
    nosy_count = 9.0
    nosy_names = ['loewis', 'nascheme', 'paul.moore', 'tim.golden', 'petr.viktorin', 'python-dev', 'zach.ware', 'serhiy.storchaka', 'steve.dower']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23903'
    versions = ['Python 3.8', 'Python 3.9']

    @zware
    Copy link
    Member Author

    zware commented Apr 10, 2015

    The attached patch adds a script that can scrape header files to pick out PyAPI_DATA and PyAPI_FUNCs to generate PC/python3.def. It's not complete and is not integrated into the build at all yet. I'm planning to work on this further at the PyCon sprints.

    Running this and comparing output to the current checked-in python3.def shows that we're not keeping it up-to-date well at all.

    @zware zware self-assigned this Apr 10, 2015
    @zware zware added build The build process and cross-build OS-windows type-feature A feature request or enhancement labels Apr 10, 2015
    @zware
    Copy link
    Member Author

    zware commented Apr 13, 2015

    Same patch, with diff to python3.def to make it available.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 16, 2015

    I think the approach taken in this script is incorrect. It leads to false modifications of the stable ABI, making it in fact unstable. Four examples:

    PyAST_FromNode does not belong to the stable ABI, because "struct _node" doesn't belong to the stable ABI (see PEP-384 for a list of structures that belong to the stable ABI). In fact, ast.h isn't even a header file that extension modules should regularly include.

    All deletions are (e.g. PyCFunction_New) are certainly incorrect: functions must never be removed from the stable ABI. In fact, Objects/methodobject.c preserves PyCFunction_New just for the sake of the stable ABI (despite there being no longer a declaration in the header file).

    PyErr_SetFromWindowsErrWithFilename could belong to the stable ABI, or not, depending on the intention of stability in its addition. Since it was there since 3.0, and had not been in the stable ABI back then, I'd claim that it doesn't belong to the stable ABI (apparently, nobody has missed it). OTOH

    PyExc_BrokenPipeError might be a useful addition to the stable ABI. However, it was not there in 3.0, so the header files ought to version-guard its declaration, but currently don't.

    So I prefer if the def file keeps being maintained manually; I don't mind if there is a tool that checks for inconsistencies, leaving their resolution to a maintainer.

    @zooba
    Copy link
    Member

    zooba commented Apr 16, 2015

    That would be the preferred way to do it, but we got no traction from anyone at PyCon or on python-dev when we tried to get people to look at what's changed.

    I suspect we haven't marketed the stable API well enough to have a lot of users, and the build tools still mark everything as 3.x-specific rather than 3-specific. Keeping it as a bare minimum interface has its value, but it's not unreasonable to expect that the functions available from the headers under Py_LIMITED_API are actually exported by python3.dll.

    @zware
    Copy link
    Member Author

    zware commented May 24, 2015

    What about including this in Tools/scripts/patchcheck.py? I think that might strike a good enough balance between making changes noticeable and not automatically changing the def file.

    As far as all the differences this has uncovered, I'll try to make a patch to #ifdef them in/out.

    @zooba
    Copy link
    Member

    zooba commented May 24, 2015

    Could the latter break people already using the stable ABI on other platforms? Does it even work for others? I don't know what the equivalent of python3.dll would be.

    If it doesn't exist outside of Windows, then assuming that everything not in python3.def isn't stable is fine. Otherwise I think we need to go the other way.

    @zooba
    Copy link
    Member

    zooba commented May 28, 2015

    It might be nice to have a test that tries to import all of the stable ABI at build time so that the buildbots break if someone adds a new API and doesn't think about what they've done.

    As an aside, it'd be nice for the script to take the version numbers as args so we can integrate it into the MSBuild steps (which already know the version) and avoid having to update this each time we update the Python version.

    @zware
    Copy link
    Member Author

    zware commented May 2, 2016

    Unassigning since I haven't made it anywhere with this in a year; if anybody else would like to pick it up, feel free. Otherwise I'll get to it some day :)

    @zware zware removed their assignment May 2, 2016
    @serhiy-storchaka
    Copy link
    Member

    This tool would be very useful, since new API often accidentally marked as stable API in header files. The script can play just deliberative role, warn if new unstable API is not wrapped with "#ifndef Py_LIMITED_API" or if stable API is not documented. May be we need separate list of public API in more regular format for comparison with header, documentation, python3.def, refcounts.dat, etc.

    There is a code for parsing preprocessor instructions in Argument Clinic. May be it can be reused here.

    @zooba
    Copy link
    Member

    zooba commented May 2, 2016

    As I recall, the tool is ready to go. The problem is the argument over what to do with all the APIs that currently are incorrect. Zach had a list at PyCon last year, and the consensus seemed to be that "someone" should figure out whether each API should be stable or not. Since Zach understandably doesn't want to be that someone, there's really no way to reliably tool this process when there are a number of existing violations.

    (FWIW, I believe we should just add them to the stable list, as that's the only way to guarantee source compatibility. They've already shipped in 3.5 and people can use them on non-Windows OSs.)

    @serhiy-storchaka
    Copy link
    Member

    See also closely related bpo-26900.

    @zware
    Copy link
    Member Author

    zware commented May 2, 2016

    There is a code for parsing preprocessor instructions in Argument Clinic.
    May be it can be reused here.

    It already is, actually; its availability was a major factor in my thinking this would be possible and starting to work on it.

    @zooba
    Copy link
    Member

    zooba commented Dec 21, 2016

    I just ran into this because PyModule_AddFunctions() was added to the stable ABI in 3.5, but was not added to python3.dll.

    This is blatantly a compilation error. We _need_ python3.def and the stable ABI to be in sync.

    I've rerun the script against Python 3.6 and attached it. I'm also going to post to python-dev to attract some attention.

    I think at this stage we ought to update python3.dll to include everything that could be used from the headers. Unfortunately, that means extensions could potentially break if built on 3.6.1 and run against 3.6.0, but it's inevitable that there will be some breakage and this way it's only temporary.

    @zooba zooba added the 3.7 (EOL) end of life label Dec 21, 2016
    @zooba
    Copy link
    Member

    zooba commented Dec 21, 2016

    Guess I should at least pose the question - Ned, would you consider the change in my patch for 3.6.0? Only the one file is affected, and it only contributes to one DLL that is (a) recommended, (b) not widely used and (c) very incompatible with the included headers.

    @ned-deily
    Copy link
    Member

    I don't know enough about how python3.dll is used in the Windows world to ask the right questions. One obvious one: would this change likely affect any third-party binaries already built for 3.6.0? On the other hand, if we have to have one, it would be better to have a compatibility break at 3.6.0 rather than at 3.6.1. I really don't want to take any more changes at this point for 3.6.0 but if you are convinced it is the right thing to do and won't break anything, get it in for 3.6.1 now and I'll reluctantly cherrypick it for 3.6.0.

    @zooba
    Copy link
    Member

    zooba commented Dec 21, 2016

    would this change likely affect any third-party binaries already built for 3.6.0? On the other hand, if we have to have one, it would be better to have a compatibility break at 3.6.0 rather than at 3.6.1.

    Assuming we only make additive changes, then nothing that currently exists will break. However, extensions may then be built using 3.6.1 that will fail on earlier versions. This goes against the entire purpose of the limited API - making the change for 3.6.0 at least means they will work for the entire 3.6 series.

    This is one of those "all options are bad" scenarios. Making the change today will prevent us from fixing the functions that should not have leaked into the limited API, but then again they've already leaked and people can compile against them.

    Perhaps the best approach is to revise all functions in the limited API, restrict back to those that were in 3.3 and carefully add new ones from there. Then we just apologise for 3.5.[0-2] and 3.6.0 potentially producing invalid binaries and recommend getting a newer version to build with.

    I've convinced myself that 3.6.0 is too soon to make the right fix here, so it'll probably always be a bad version in this respect. Hopefully by 3.6.1 we can fix up the limited API and make it reliable again :(

    @serhiy-storchaka
    Copy link
    Member

    PyCmpWrapper_Type is not defined.

    Some underscored names are defined only in special builds (with Py_REF_DEBUG, Py_TRACE_REFS).

    PyAST_* and PyNode_* functions are not documented, ast.h and node.h are private headers.

    Names like _PyArg_Parse_SizeT shouldn't be removed.

    @zooba
    Copy link
    Member

    zooba commented Dec 23, 2016

    ast.h and node.h are private headers.

    You're right, these should be excluded (possibly from the release too?)

    I've been playing with the script in a separate context and I think I've hit a few issues (though I have made some modifications, so YMMV). It's probably just as well we waited. I'll thoroughly validate it before coming back with a new patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 23, 2016

    New changeset 9cb87e53e324 by Serhiy Storchaka in branch '3.5':
    Sort and remove duplicates from PC/python3.def (issue bpo-23903).
    https://hg.python.org/cpython/rev/9cb87e53e324

    New changeset 0927b5c80c50 by Serhiy Storchaka in branch '3.6':
    Sort and remove duplicates from PC/python3.def (issue bpo-23903).
    https://hg.python.org/cpython/rev/0927b5c80c50

    New changeset e64a82371d72 by Serhiy Storchaka in branch 'default':
    Sort and remove duplicates from PC/python3.def (issue bpo-23903).
    https://hg.python.org/cpython/rev/e64a82371d72

    @serhiy-storchaka
    Copy link
    Member

    Reordered to make diffs easier for reviewing.

    @zooba
    Copy link
    Member

    zooba commented Dec 23, 2016

    Good call.

    I think I might actually replace the script with a build step that uses the C preprocessor to get all the names, then a script to generate the file. If someone who knows the POSIX build system can help out, we could generate a list on each build for various platforms and fail when it doesn't match what's committed.

    Since the preprocessor doesn't have to generate valid code, we should be able to cross-compile for other platforms, though that's less critical.

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-29058.

    @serhiy-storchaka
    Copy link
    Member

    Here is a shell script that uses the C preprocessor and other Unix tools to generate PC/python3.def. It doesn't add PyAST_* and PyNode_* functions and doesn't remove PyArg_VaParse* functions. But it still removes some names that are no longer used in public interface, but should be kept in the stable ABI for compatibility. E.g. PyCFunction_New (now it is a macro expanded to PyCFunction_NewEx) or _PyTrash_deposit_object (invoked by the macros before 3.2.4). It seems to me that this is unavoidable, and the resulting file must be manually edited.

    @serhiy-storchaka
    Copy link
    Member

    Following patch against 3.6 just adds new names to PC/python3.def. The only removed names are PyExc_MemoryErrorInst and PyUnicode_SetDefaultEncoding. They don't exist in 3.x.

    @zooba
    Copy link
    Member

    zooba commented Dec 26, 2016

    Those macros like PyCFunction_New actually need to become functions again, since we need real exports to forward them to the right place. Converting stable API functions to macros breaks the ABI, since modules compiled against the old one can no longer link to newer versions, and modules compiled against newer ones cannot link to older versions.

    I'll try and figure out a way we can compile those functions directly into python3.dll, since that's going to leave us with the best ability to use macros for non-stable API builds. We will, however, need to update the header files to define actual functions rather than macros when Py_LIMITED_API is set.

    @serhiy-storchaka
    Copy link
    Member

    They do. Functions PyCFunction_New, _PyTrash_deposit_object etc are compiled and exported even if they don't represented in headers.

    Actually we can remove the PyCFunction_New function since it was converted to to a macro before Python 3.0 (faabd9867fb8). No one Python 3 extension uses it. But this is separate issue. _PyTrash_deposit_object was used in 3.2.3 and should be kept for binary compatibility.

    @zooba
    Copy link
    Member

    zooba commented Dec 26, 2016

    Attached a patch that integrates generation into the build of python3.dll. There are also a couple of fixes for headers to retain functions/data that were previously exported.

    The patch includes an updated python3.def file, though I expect that will go away and we'll check in the python3.txt file that is generated on build (which includes the return type). That way we can fail builds that modify the file (regardless of platform) and require an explicit step when adding to the stable API.

    There are still a few functions towards the end of python3.def that I need to figure out how to deal with. (Also, my sort script doesn't quite match the result from however Serhiy sorted it last, but it's as close as I could get.)

    @serhiy-storchaka
    Copy link
    Member

    It is not good that the script removes existing names and adds non-existing names. It is safe to remove names that don't exist in all releases starting from 3.2.0.

    1. PyCFunction_New shouldn't be removed in maintained releases. This name is exported, but is unlikely used by third-party code. It can be removed in separate issue, and only in 3.7.

    2. PyCmpWrapper_Type shouldn't be added. It is not defined in 3.x.

    3. _PyTrash_destroy_chain and like should not be removed for compatibility with 3.2.0-3.2.3.

    4. _PyBytes_DecodeEscape shouldn't be added. It is private.

    5. Exporting Py_hexdigits looks doubtful. This constant was added for internal use (like _Py_SwappedOp). Unlikely it is used in third-party code. Not attributing it as private looks an error.

    6. _Py_TrueStruct shouldn't be removed. It is used in the Py_True macro.

    7. *_SizeT names shouldn't be removed. They are used when PY_SSIZE_T_CLEAN is defined before including Python.h (recommended).

    8. I think that rather than adding PyODict_Type to limited API, PyODict_Check and PyODict_CheckExact should be excluded from limited API (see bpo-29058). They never worked in limited API. Note that PyDict_Type as well as other types are not included in limited API. We can consider exposing PyODict_Check and PyODict_CheckExact as functions in limited API in 3.7, but this is other issue.

    As for PyArg_VaParse and PyArg_VaParseTupleAndKeywords see bpo-11626. Seems they were excluded from limited API by mistake.

    I suggest first fix errors in python3.def, remove non-existing names, add new names, and only after this add the ability of generating it if it doesn't break python3.def.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 27, 2016

    New changeset 8423f86486b3 by Serhiy Storchaka in branch '3.5':
    Issue bpo-23903: Fixed errors and remove non-existing names in python3.def.
    https://hg.python.org/cpython/rev/8423f86486b3

    New changeset b5470d08969c by Serhiy Storchaka in branch '3.6':
    Issue bpo-23903: Fixed errors and remove non-existing names in python3.def.
    https://hg.python.org/cpython/rev/b5470d08969c

    New changeset 45507a5751d8 by Serhiy Storchaka in branch 'default':
    Issue bpo-23903: Fixed errors and remove non-existing names in python3.def.
    https://hg.python.org/cpython/rev/45507a5751d8

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 27, 2016

    New changeset d95fee442e27 by Serhiy Storchaka in branch '3.5':
    Issue bpo-23903: Added missed names to PC/python3.def.
    https://hg.python.org/cpython/rev/d95fee442e27

    New changeset cb864fc4b3be by Serhiy Storchaka in branch '3.6':
    Issue bpo-23903: Added missed names to PC/python3.def.
    https://hg.python.org/cpython/rev/cb864fc4b3be

    New changeset 513852ad0c5c by Serhiy Storchaka in branch 'default':
    Issue bpo-23903: Added missed names to PC/python3.def.
    https://hg.python.org/cpython/rev/513852ad0c5c

    @zooba
    Copy link
    Member

    zooba commented Dec 27, 2016

    Unfortunately, there's no way too remove defined names from the DLL during Python 3 at all, except where the prototype was never provided. PyCFunction_New has always been a macro, and PySys_SetDefaultEncoding looks to have been removed before the API was committed.

    Maintaining an export and a macro is going to be awkward, so I think we should just change them back to functions. Any half-decent optimizing compiler is going to inline it within core, and it's better than having two implementations in different places of the same function.

    But yes I agree, fix the def first and then automate. But I don't want to fix it but adding things that were never usable.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 27, 2016

    New changeset 2e5ad97c9c19 by Serhiy Storchaka in branch '3.5':
    Issue bpo-23903: Added missed Windows-specific names to PC/python3.def.
    https://hg.python.org/cpython/rev/2e5ad97c9c19

    New changeset 86a412584c02 by Serhiy Storchaka in branch '3.6':
    Issue bpo-23903: Added missed Windows-specific names to PC/python3.def.
    https://hg.python.org/cpython/rev/86a412584c02

    New changeset 3a2595f82447 by Serhiy Storchaka in branch 'default':
    Issue bpo-23903: Added missed Windows-specific names to PC/python3.def.
    https://hg.python.org/cpython/rev/3a2595f82447

    @zooba
    Copy link
    Member

    zooba commented Oct 30, 2019

    Reawakening this issue so we can finish it off - the stable ABI has been declared "clean" on python-dev, so there shouldn't be any more concerns about APIs leaking in now. We can assume that everything accessible from the header files should be included in python3.def.

    If it has been cleaned up though, we might need to go through and restore anything that used to be in python3.def but wasn't meant to be. Those are stable.

    @zooba zooba added 3.8 only security fixes 3.9 only security fixes labels Oct 30, 2019
    @ned-deily ned-deily removed the 3.7 (EOL) end of life label Jun 25, 2020
    @encukou
    Copy link
    Member

    encukou commented Feb 16, 2021

    I plan to do this slightly differently, see PEP-652.

    @encukou
    Copy link
    Member

    encukou commented May 6, 2021

    The symbols exported by python3.dll are now generated from Misc/stable_abi.txt as per PEP-652.
    See the devguide for more details: https://devguide.python.org/c-api/

    This is slightly more work than generating the list fully automatically, but the extra work is negligible compared to all the things you should think about when adding API that'll be be supported until Python 4.0.

    Headers are now checked on Unix (GCC) only. If anyone wants to work on a cross-platform C parser/checker, please let me know.

    @encukou
    Copy link
    Member

    encukou commented Jun 1, 2021

    PEP-652 (bpo-43795) is now in; I'm closing this issue.

    A cross-platform C parser/checker would still be an improvement, but the constraints (no external dependencies in CPython) and benefits (not many compared to the goal of strict separation of internal/public/stable headers) aren't really aligned in its favor.

    @encukou encukou closed this as completed Jun 1, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes build The build process and cross-build OS-windows topic-C-API type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants