Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C extension naming doesn't take bitness into account #67169

Closed
pitrou opened this issue Dec 2, 2014 · 96 comments
Closed

C extension naming doesn't take bitness into account #67169

pitrou opened this issue Dec 2, 2014 · 96 comments
Labels
deferred-blocker type-feature A feature request or enhancement

Comments

@pitrou
Copy link
Member

pitrou commented Dec 2, 2014

BPO 22980
Nosy @malemburg, @warsaw, @doko42, @ncoghlan, @pitrou, @larryhastings, @tjguk, @ned-deily, @ericsnowcurrently, @berkerpeksag, @zware, @1st1, @zooba, @dstufft
Files
  • 22980_windows.patch
  • 22980_2.patch: Patch with version in tag
  • abi_bitness.patch
  • ma.diff: alternate patch
  • ma2.diff
  • larry.whatsnew35.ext.module.suffix.diff.1.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-09-12.16:13:19.053>
    created_at = <Date 2014-12-02.14:35:25.510>
    labels = ['deferred-blocker', 'type-feature']
    title = "C extension naming doesn't take bitness into account"
    updated_at = <Date 2017-01-14.07:25:16.112>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2017-01-14.07:25:16.112>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-09-12.16:13:19.053>
    closer = 'larry'
    components = []
    creation = <Date 2014-12-02.14:35:25.510>
    creator = 'pitrou'
    dependencies = []
    files = ['37379', '37384', '38155', '38964', '39027', '40420']
    hgrepos = []
    issue_num = 22980
    keywords = ['patch']
    message_count = 96.0
    messages = ['232000', '232001', '232002', '232003', '232008', '232014', '232016', '232026', '232030', '232031', '232033', '232034', '232036', '232037', '232038', '232039', '232040', '232041', '232042', '232043', '232044', '232045', '232047', '232052', '232063', '232065', '232251', '232256', '232266', '232268', '232270', '232272', '232277', '232297', '232710', '232711', '232729', '232740', '232759', '232760', '232764', '232765', '232807', '233822', '236109', '236111', '237558', '237559', '240516', '240548', '240746', '240748', '240750', '240761', '240762', '240769', '240775', '241065', '241127', '241139', '241141', '241159', '241167', '241183', '241211', '241223', '241228', '241230', '241231', '241236', '241242', '241248', '241249', '241250', '241251', '241252', '241253', '241254', '241283', '241285', '241287', '241291', '241512', '241513', '246247', '249725', '250045', '250299', '250302', '250308', '250341', '250424', '250425', '250431', '250524', '285464']
    nosy_count = 16.0
    nosy_names = ['lemburg', 'barry', 'doko', 'ncoghlan', 'pitrou', 'larry', 'tim.golden', 'ned.deily', 'Arfrever', 'python-dev', 'eric.snow', 'berker.peksag', 'zach.ware', 'yselivanov', 'steve.dower', 'dstufft']
    pr_nums = []
    priority = 'deferred blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22980'
    versions = ['Python 3.5']

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Currently, C extensions are named something like "_helperlib.cpython-34dm.so". This doesn't take into account the bitness of the interpreter (32- vs. 64-bit), which makes it awkward to use the same working copy with two different interpreters (you have to rebuild everything each time you switch bitnesses).

    Worse, under Windows it seems ABI tags aren't even used, giving generic names such as "_helperlib.pyd". Why is that?

    @pitrou pitrou added the type-feature A feature request or enhancement label Dec 2, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2014

    See also the PEP-3149.

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    PEP-3149 says """It is not currently clear that the facilities in this PEP are even useful for Windows""". Well, it seems I have found a use for it :-)

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Ideally, we would use distutils.util.get_platform(). However, there are two cases where it relies on other modules:

    • the re module under CygWin
    • the sysconfig and _osx_support under OS X

    Of course, ideally we should be able to hardcode this into the compiled CPython executable...

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    As a side-note, it is interesting to note that Python currently wrongly identifies 32-bit builds under 64-bit Linux:

    Python 3.5.0a0 (default:64a54f0c87d7, Nov  2 2014, 17:18:13) 
    [GCC 4.9.1] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, os, sysconfig
    >>> sys.maxsize
    2147483647
    >>> os.uname()
    posix.uname_result(sysname='Linux', nodename='fsol', release='3.16.0-25-generic', version='#33-Ubuntu SMP Tue Nov 4 12:06:54 UTC 2014', machine='x86_64')
    >>> sysconfig.get_platform()
    'linux-x86_64'

    AFAIU, sysconfig.get_platform() (or its sibling distutils.util.get_platform()) is used for the naming of binary distributions...

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    The MULTIARCH variable can help at least under Linux:

    >>> import sysconfig
    >>> sysconfig.get_platform()
    'linux-x86_64'
    >>> sysconfig.get_config_var('MULTIARCH')
    'i386-linux-gnu'

    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2014

    There is also platform.architecture(). I don't like its implementation, it relies on the external file program :-(

    @zooba
    Copy link
    Member

    zooba commented Dec 2, 2014

    I'm very much in favor of adding this for .pyds on Windows.

    I assume the hard part will be getting the details for Linux (doesn't bitness have to be compiled in there? For Windows it can be determined at compile-time...), but preferring an extension with the ABI tag and falling back on one without seems easy enough.

    (Would/could this also work for .py files? So a 2.7/3.x or Jython/CPython/IronPython package could include tags in pure-Python code files?)

    @malemburg
    Copy link
    Member

    Note that there's a difference between the platform's architecture (which is what get_platform() returns) and the pointer size of the currently running Python executable.

    On 64-bit Linux, it's rather rare to have an application built as 32-bit executable. On 64-bit Windows, it's rather common to have 32-bit applications running.

    The best way to determine 32-bit vs. 64-bit is by using the struct module:

        # Determine bitsize used by Python (not necessarily the same as
        # the one used by the platform)
        import struct
        bits = struct.calcsize('P') * 8

    This should be portable across all platforms and will always refer to the pointer size of the currently running Python executable.

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Note that there's a difference between the platform's architecture

    Yes, that's pointed out above.

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Nothing new should be necessary for pyc files under Windows:

    Python 3.4.2 |Continuum Analytics, Inc.| (default, Oct 22 2014, 11:51:45) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.implementation.cache_tag
    'cpython-34'

    The problem is with C extensions:

    >>> import _imp
    >>> _imp.extension_suffixes()
    ['.pyd']

    Compare with Linux:

    >>> import _imp
    >>> _imp.extension_suffixes()
    ['.cpython-35dm.so', '.abi3.so', '.so']

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    I assume the hard part will be getting the details for Linux (doesn't bitness
    have to be compiled in there? For Windows it can be determined at compile-
    time...), but preferring an extension with the ABI tag and falling back on
    one without seems easy enough.

    Sticking to bitness should be easy (although I wonder if it would be desirable for platforms with fat binaries - Ned?). If we can go the extra mile and include platform identification all the better, of course.

    @zooba
    Copy link
    Member

    zooba commented Dec 2, 2014

    I was more interested in source file resolution than bytecode caching. If Python 3.5 would prefer "spam.cpython-35.py" or "spam.cpython-3.py" over "spam.py" and Python 2 preferred "spam.py", then I can more easily separate the code that won't parse in the alternative.

    Happy to be told it's unrelated and I should raise it separately, but from my POV resolving .pyd filenames looks very similar to resolving .py files.

    @malemburg
    Copy link
    Member

    On 02.12.2014 19:02, Antoine Pitrou wrote:

    Sticking to bitness should be easy (although I wonder if it would be desirable for platforms with fat binaries - Ned?). If we can go the extra mile and include platform identification all the better, of course.

    I hear the "can of worms" alarm ringing :-)

    Seriously, I think that putting platform infos into the file name
    is bound to cause more trouble than it tries to solve. Fat builds
    leave the decision to the linker, which is a good method and avoids
    the file name clashes.

    I think we should only focus on platforms where fat builds are
    uncommon, while at the same time you do have to support multiple
    architectures, like e.g. Windows:

    http://en.wikipedia.org/wiki/Fat_binary

    Note that on Linux, 32-bit and 64-bit versions are typically placed
    into different directory trees:

    http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

    so I'm not sure whether it's a real problem on Linux.

    @zooba
    Copy link
    Member

    zooba commented Dec 2, 2014

    But since you pointed out cache-tag, should that distinguish for bitness as well? It seems to be 'cpython-34' for both 32-bit and 64-bit interpreters on Windows, which isn't really a problem now, but may become one if we start allowing/encouraging sharing packages between interpreters.

    In fact, it probably is an issue now with user site-packages, since that path is the same for both 32-bit and 64-bit...

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Fat binaries seem to exist under:

    • OS X: yes, that's why I was asking for Ned's advice
    • Linux: "A proof-of-concept Ubuntu 9.04 image is available"... enough said
    • DOS: perhaps MicroPython is interested :-)

    http://en.wikipedia.org/wiki/Fat_binary

    @malemburg
    Copy link
    Member

    On 02.12.2014 19:40, Steve Dower wrote:

    I was more interested in source file resolution than bytecode caching. If Python 3.5 would prefer "spam.cpython-35.py" or "spam.cpython-3.py" over "spam.py" and Python 2 preferred "spam.py", then I can more easily separate the code that won't parse in the alternative.

    Happy to be told it's unrelated and I should raise it separately, but from my POV resolving .pyd filenames looks very similar to resolving .py files.

    That's an interesting idea, but indeed unrelated to this ticket :-)

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Note that on Linux, 32-bit and 64-bit versions are typically placed
    into different directory trees

    By whom? Our standard installer doesn't (it uses ../lib/python-X.Y for all builds).

    Also, one of the problems (and actually the problem which triggered this tracker entry) is when doing development inside a working copy (either through "setup.py develop" or "setup.py build_ext --inplace" - both copy C extensions directly into the source tree).

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    @Steve: IIRC, pyc files should be portable, so there's no need to differentiate between various bitnesses.

    @zooba
    Copy link
    Member

    zooba commented Dec 2, 2014

    @antoine: You're right. I hereby withdraw all contributions to this thread after my first statement of support :)

    @malemburg
    Copy link
    Member

    On 02.12.2014 19:46, Antoine Pitrou wrote:

    > Note that on Linux, 32-bit and 64-bit versions are typically placed
    > into different directory trees

    By whom? Our standard installer doesn't (it uses ../lib/python-X.Y for all builds).

    By the system vendors. Packages (with extensions) will automatically
    pick up their configuration.

    Also, one of the problems (and actually the problem which triggered this tracker entry) is when doing development inside a working copy (either through "setup.py develop" or "setup.py build_ext --inplace" - both copy C extensions directly into the source tree).

    Fair enough; it's a rare use case, but may be worth supporting.

    My main point was that we shouldn't start adding tags for e.g.
    PPC, Intel, ARM, etc. since platforms needing to support multiple
    such architectures will typically support fat builds anyway.

    How about using these flags:

    b0 - 16-bit
    b1 - 32-bit
    b2 - 64-bit
    b3 - 128-bit
    and so on

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 2, 2014

    Le 02/12/2014 19:59, Marc-Andre Lemburg a écrit :

    My main point was that we shouldn't start adding tags for e.g.
    PPC, Intel, ARM, etc. since platforms needing to support multiple
    such architectures will typically support fat builds anyway.

    How about using these flags:

    b0 - 16-bit
    b1 - 32-bit
    b2 - 64-bit
    b3 - 128-bit

    Fair enough, although I think we only need 32-bit and 64-bit for now,
    and "32b" vs. "64b" would probably be more readable :-)

    @malemburg
    Copy link
    Member

    On 02.12.2014 20:10, Antoine Pitrou wrote:

    Antoine Pitrou added the comment:

    Le 02/12/2014 19:59, Marc-Andre Lemburg a écrit :
    >
    > My main point was that we shouldn't start adding tags for e.g.
    > PPC, Intel, ARM, etc. since platforms needing to support multiple
    > such architectures will typically support fat builds anyway.
    >
    > How about using these flags:
    >
    > b0 - 16-bit
    > b1 - 32-bit
    > b2 - 64-bit
    > b3 - 128-bit

    Fair enough, although I think we only need 32-bit and 64-bit for now,
    and "32b" vs. "64b" would probably be more readable :-)

    True, I'm just not sure what the parsing requirements are and
    the ABI version names are too long already, IMO. PEP-425 used
    a nice short variant for the Python part.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 2, 2014

    Would it be possible to add something to the sys module, computed
    during the compilation, instead of having to rely on platform,
    sysconfig, struct or something else?

    Note: There is also the funnny x32 platform project :-)
    https://sites.google.com/site/x32abi/ 32-bit pointer on 64-bit CPU.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Dec 2, 2014

    My initial thought is to add an "abitags" attribute to sys.implementation
    (plural so we can also indicate stable ABI support).

    If we define the algorithm clearly, then setuptools & distlib could make it
    available on earlier Python versions.

    @ned-deily
    Copy link
    Member

    Re PEP-3149 file names: it hadn't struck me until fairly recently that PEP-3149-style extension file names were never implemented for OS X, i.e. they are still of the form _helperlib.so. I'm not sure why that is the case since other aspects of PEP-3149-like file names do exist on OS X, including naming libpython; perhaps it was just erring on the side of caution.

    Re bitness: As Marc-Andre points out, Apple addressed the multi-arch problem with the concept of universal (or "fat") binary files, implemented for executables, libs (static and dynamic), and bundles (e.g .so's). In general, dealing with multiple architectures is abstracted away by the compiler tool chain at build time and the dynamic loader at run time and it's not something either Python or the user have to deal with (usually), as various combinations of architectures (currently up to 4 on OS X) are contained within the same file; for example:

    $ file _socket.so
    _socket.so: Mach-O universal binary with 3 architectures
    _socket.so (for architecture x86_64):	Mach-O 64-bit bundle x86_64
    _socket.so (for architecture i386):	Mach-O bundle i386
    _socket.so (for architecture ppc7400):	Mach-O bundle ppc
    $ file /usr/bin/python
    /usr/bin/python: Mach-O universal binary with 3 architectures
    /usr/bin/python (for architecture x86_64):	Mach-O 64-bit executable x86_64
    /usr/bin/python (for architecture i386):	Mach-O executable i386
    /usr/bin/python (for architecture ppc7400):	Mach-O executable ppc

    So, I agree with Marc-Andre that adding arch info (like bitiness) to extension module file names on OS X would add unneeded complexity for little, if any, benefit. That part works well today. Changing builds on OS X to use today's PEP-3149 file names is a separate question. It could help in the case where one site-packages library is used with multiple Python instances but, even there, that is probably not a big issue outside of developer environments: (1) I don't know of any distributor of Python for OS X who supports multiple ABIs (e.g. non-debug vs debug) in one package; (2) Python OS X framework builds, used by python.org, Apple, and most third-parties, generally have separate install locations including their lib-dynload and site-packages directories so installing multiple instances of the same Python version from different vendors isn't a big deal. It would be nice to be able to allow non-debug vs debug builds to co-exist better (the primary use case I see for PEP-3149 file names for Py3 on OS X) but I don't recall anyone asking for it. If we were to change OS X to use today's PEP-3149 file names, I would only want to do it in a new release, not backport it.

    @zooba
    Copy link
    Member

    zooba commented Dec 6, 2014

    What can I do to help move this along?

    It sounds like for Windows builds we could change _imp.extension_suffixes() from ['.pyd'] to ['.{}.pyd'.format(distutils.util.get_platform()), '.pyd'] and update distutils to produce the more specific name (I've got some work to do on distutils anyway for 3.5, so I'm happy to do this part). This would also include somehow hard-coding the get_platform() result into the executable (probably a #define in pyconfig.h)

    I'm more inclined towards get_platform() than adding new architecture tags. Windows at least doesn't support fat binaries - the closest equivalent is universal apps, which use separate binaries and a naming convention. Adding a debug marker here would also be nice, as I've never been a huge fan of the "_d" suffix we currently have, but it's not a big deal.

    I suspect any changes here would be completely separate from other platforms, but ISTM that we're looking at a similar change to handle the bitness/debug issue on Linux. I'm not volunteering to do that part :)

    @pitrou
    Copy link
    Member Author

    pitrou commented Dec 6, 2014

    Le 06/12/2014 21:11, Steve Dower a écrit :

    I suspect any changes here would be completely separate from other
    platforms, but ISTM that we're looking at a similar change to handle the
    bitness/debug issue on Linux. I'm not volunteering to do that part :)

    I think committing changes on a per-platform basis is fine here. After
    all the current scheme is quite platform-specific (I was unaware of this
    until a few days ago :-)).

    So, yes, let's get the ball rolling under Windows. I think you're the
    most competent person to choose a naming scheme!

    @zooba
    Copy link
    Member

    zooba commented Dec 7, 2014

    The attached patch adds platform tags for .pyd files for "win32", "win-arm", "win-amd64" and "win-ia64", which are the known compilers in pyconfig.h and the potential return values from distutils.util.get_platform(). It also fixes a bug where the suffix would be incorrect if building a debug extension.

    I haven't been able to think of any scenarios where this could break other than perhaps packaging (since distutils defaults to including the tag), and we've got plenty of time to sort those issues out. A quick test installing Cython and some packages built with Cython seemed to be fine. AIUI, MinGW/cygwin builds won't use PC/pyconfig.h, and so they won't see any change.

    @doko42
    Copy link
    Member

    doko42 commented Apr 16, 2015

    On 04/16/2015 05:56 PM, Marc-Andre Lemburg wrote:

    Marc-Andre Lemburg added the comment:

    On 16.04.2015 17:30, Matthias Klose wrote:
    >
    > Matthias Klose added the comment:
    >
    > Nick filed issue bpo-23966 to document these changes. Yes, these tags should be documented, so that installers don't have to guess (currently they are only exposed in importlib.machinery.EXTENSION_SUFFIXES.
    >
    > What you describe as a "simple idea" is just another hack, only addressing the issue on x86 platforms, not addressing this for soft-float/hard-float calling conventions, not addressing this for endianness, not addressing this for other platform ABIs. And for all these cases there are machines where you can run the variants on the same machine. If you like to call this "a mess", fine. But this is reality. I'm not creating this mess, I'm describing this and exposing it to the interpreter.

    The simple idea Antoine had was to be able to install C extensions
    compiled for different bit architectures, but the *same platform*
    into the same directory, which is similar to what we're doing for
    Python byte code files.

    The typical use case is to have a 32-bit version and a 64-bit version
    on the same system.

    It seems that the scope of this simple idea has now completely
    blown up in trying to stuff all kinds of other platform features
    into the binary names as well.

    And while doing so, we now have different naming styles on different
    platforms, require hand written configure files to support additional
    platforms, and have yet another system for associating platform
    information with binary Python files, in addition to
    PEP-3149, PEP-425 and PEP-427.

    See http://bugs.python.org/issue22980#msg232729

    I don't think this is a good development and I can hardly imagine
    a use case where all those different ABIs will have to live on the
    same machine and in the same directory.

    At the same time you are creating incompatibilities which did
    not exist before, by requiring configure script fixes for "unknown"
    platforms.

    I'm -1 on these changes. I was +0 on Antoine's original idea,
    since that addresses real life use case you can run into every
    now and then.

    I'm disappointed that you discredit any other use case besides what you think as
    the typical use case as not real life use case. Maybe you are focused on x86
    only, but if you've been to PyCon 2014, you should have a nice Raspberry Pi.
    What do you run on it, a soft float, or a hard float distribution? How do you
    distribute extensions for that? Yes, you can run both at the same time. There
    are now the first 64bit Raspberry Pi like boards (https://www.96boards.org/).
    Most of the SoCs can run ARM32 hard- and soft-float binaries, but not all, and
    that's why AArch64 gets an ILP32 ABI too. Maybe you don't like the variety in
    the ARM world, but's that real life.

    @malemburg
    Copy link
    Member

    On 16.04.2015 18:53, Matthias Klose wrote:

    I'm disappointed that you discredit any other use case besides what you think as
    the typical use case as not real life use case. Maybe you are focused on x86
    only, but if you've been to PyCon 2014, you should have a nice Raspberry Pi.
    What do you run on it, a soft float, or a hard float distribution? How do you
    distribute extensions for that? Yes, you can run both at the same time. There
    are now the first 64bit Raspberry Pi like boards (https://www.96boards.org/).
    Most of the SoCs can run ARM32 hard- and soft-float binaries, but not all, and
    that's why AArch64 gets an ILP32 ABI too. Maybe you don't like the variety in
    the ARM world, but's that real life.

    I'm not trying to discredit any use cases, I just don't see them.

    For package distributions you do need to make your distribution
    files unique and it makes sense adding such complex ABI tags
    to them, including even having to invest into manually maintaining
    them.

    However, for plain .so files that you have on your system (which will
    mostly like not support more than 2-4 different architecture configurations
    running at the same time), I don't currently see a need to make things
    more complicated than necessary.

    Perhaps you can point me to some use cases where the triple
    platform tag is really useful.

    @malemburg
    Copy link
    Member

    On 16.04.2015 19:14, Marc-Andre Lemburg wrote:

    However, for plain .so files that you have on your system (which will
    mostly like not support more than 2-4 different architecture configurations
    running at the same time), I don't currently see a need to make things
    more complicated than necessary.

    Perhaps you can point me to some use cases where the triple
    platform tag is really useful.

    Antoine's ticket is the first in two decades to request being
    able to install .so extension files side-by-side, so even if
    times and platforms change, people don't seem to have a big
    issues without this feature.

    If you have a need, it's not really hard to build your extensions
    for different architecture ABIs in different directories. We've
    been doing this for years, just like everyone else.

    @dstufft
    Copy link
    Member

    dstufft commented Apr 16, 2015

    Perhaps you can point me to some use cases where the triple
    platform tag is really useful.

    If I understand correctly (and ABI isn't my strong suite), it would be useful in the sense that you could utilize it to create a sort of "fat wheel" that included the .so's for multiple architectures and then pip could simply drop them all into place and have the interpreter decide which one to load. This is useful because maybe you have one .so in a wheel and 30 .py files, it's somewhat wasteful (both disk space and in cache efficiency) to have 10 different wheel files and those 30 .py files duplicated when it could be possible to have a single one serving 10 different architectures.

    To be clear, this ability doesn't yet exist in Wheel and I don't know of anyone pushing for it, but if Python is smart enough to load the right .so that makes fat wheels significantly easier to implement (in fact, you wouldn't need to add anything else to pip or the wheel spec to handle it I think).

    @ned-deily
    Copy link
    Member

    Antoine's ticket is the first in two decades to request being
    able to install .so extension files side-by-side, so even if
    times and platforms change, people don't seem to have a big
    issues without this feature.

    That's exactly what PEP-3149 was supposed to implement, isn't it?

    @malemburg
    Copy link
    Member

    On 16.04.2015 19:47, Ned Deily wrote:

    Ned Deily added the comment:

    > Antoine's ticket is the first in two decades to request being
    > able to install .so extension files side-by-side, so even if
    > times and platforms change, people don't seem to have a big
    > issues without this feature.

    That's exactly what PEP-3149 was supposed to implement, isn't it?

    No, PEP-3149 is about the Python ABI, following PEP-3147,
    which implements this for PYC files.

    The intent is to be able to have mutliple *Python* ABI/API versions
    installed side-by-side, not multiple platform ABI versions :-)

    @malemburg
    Copy link
    Member

    On 16.04.2015 19:44, Donald Stufft wrote:

    Donald Stufft added the comment:

    > Perhaps you can point me to some use cases where the triple
    > platform tag is really useful.

    If I understand correctly (and ABI isn't my strong suite), it would be useful in the sense that you could utilize it to create a sort of "fat wheel" that included the .so's for multiple architectures and then pip could simply drop them all into place and have the interpreter decide which one to load. This is useful because maybe you have one .so in a wheel and 30 .py files, it's somewhat wasteful (both disk space and in cache efficiency) to have 10 different wheel files and those 30 .py files duplicated when it could be possible to have a single one serving 10 different architectures.

    Well, it's even more wasteful if you have to download 100MB wheels
    with all the different platforms when the dedicated wheel would just
    need 1.5MB.

    This approach has been considered a few times in distutils history
    and no one really liked it because it would increase the download
    requirements for the users a lot. It does make things easier for
    the packages, though, but then again, this can also be had
    by having different subdirs in the wheel or other package
    format to address the issue of having name collisions.

    Today, you usually have a web installer take care of grabbing
    only the bits you need.

    @dstufft
    Copy link
    Member

    dstufft commented Apr 16, 2015

    Well, it's even more wasteful if you have to download 100MB wheels
    with all the different platforms when the dedicated wheel would just
    need 1.5MB.

    I think it's going to vary greatly based on how many platforms you're attempting to support and how big your .so's are compared to the rest of the Wheel. You can also mix and match, do a single bundle for the most popular platforms (which will mean that you're almost always serving out of cache) but then do individual wheels for the less popular platforms to keep the file size of the "main" wheel from bloating up with a bunch of .so's for platforms which are unlikely to be needed very often.

    Another possible (future) benefit - Right now we have executable zip files, but they can only reasonably contain pure Python files. There are rumblings of making it so it's possible to import .so's from inside of an executable zip file. If you bake in the platform ABI into the .so file name, it would mean in that possible future you could have a single executable zip file that just works across multiple platforms as long as you already have Python installed.

    I do agree that pretty much every place someone would want to do this, could possibly be implemented by having it look inside a per platform directory (you could implement fat wheels for instance by having platform sub dirs, same with a single executable zip file), however doing that causes duplication because every place you deal with .so's then need to deal with the question of platform ABI and have to come up with their own solution to it, instead of having a central solution which is managed by Python itself and can be re-used by all of these situations.

    @ned-deily
    Copy link
    Member

    No, PEP-3149 is about the Python ABI, following PEP-3147,
    which implements this for PYC files.

    The intent is to be able to have mutliple *Python* ABI/API versions
    installed side-by-side, not multiple platform ABI versions :-)

    Well, for all practical purposes, the platform *is* part of the ABI :=)

    So, if we have been supporting multiple P@P 3147 extension modules since 3.2, I don't see this as a risky change. I don't think anyone is advocating installing distributions with dozens of extension module variants as a general practice. But it seems like there are times when it would be useful to have the capability to have more than one and this seems like a safe and logical extension to what PEP-3147 already provides. I don't have a strong opinion about other platforms.

    For OS X, because of the complexity and usefulness of mixing and matching various fat CPU archs and OS X ABIs ("deployment target"), pip already supports selecting the proper wheel to download and wheel creators are tagging with the right metadata to make the right decisions, so I don't think the changes here bring much added value for OS X users, except for two things: (1) we now support PEP-3147 ext file names (which for some reason was never fully implemented on OS X and should have been) which is useful for the original PEP-3147 use cases (for example, if someone wants to distribute non-debug and debug versions of ext modules); (2) the addition of '-darwin' to the PEP-3147 ext file name allows for this additional use case of allowing multiple platform extensions to be stored in the same directory without fear of name clash (even if only one is eventually installed). I think both are reasonable and safe changes for OS X.

    @malemburg
    Copy link
    Member

    On 16.04.2015 20:17, Donald Stufft wrote:

    Donald Stufft added the comment:

    > Well, it's even more wasteful if you have to download 100MB wheels
    > with all the different platforms when the dedicated wheel would just
    > need 1.5MB.

    I think it's going to vary greatly based on how many platforms you're attempting to support and how big your .so's are compared to the rest of the Wheel. You can also mix and match, do a single bundle for the most popular platforms (which will mean that you're almost always serving out of cache) but then do individual wheels for the less popular platforms to keep the file size of the "main" wheel from bloating up with a bunch of .so's for platforms which are unlikely to be needed very often.

    Whatever you do, you're still going to force all your main users to
    download things they don't need, so I don't see the argument of
    optimizing downloads or caches.

    Another possible (future) benefit - Right now we have executable zip files, but they can only reasonably contain pure Python files. There are rumblings of making it so it's possible to import .so's from inside of an executable zip file. If you bake in the platform ABI into the .so file name, it would mean in that possible future you could have a single executable zip file that just works across multiple platforms as long as you already have Python installed.

    Since you need special support for such ZIP files (either using dlopen
    hacks or temporarily extracting them), you might as well deal with
    the platform dependencies in that handler. No need to force the
    platform tags on all your .so file for no apparent reason.

    There's a very real use case for having multiple Python versions
    installed which was the motivation for the PEPs I quoted, but this
    development is one of those YAGNI features only very few people
    will ever need.

    I do agree that pretty much every place someone would want to do this, could possibly be implemented by having it look inside a per platform directory (you could implement fat wheels for instance by having platform sub dirs, same with a single executable zip file), however doing that causes duplication because every place you deal with .so's then need to deal with the question of platform ABI and have to come up with their own solution to it, instead of having a central solution which is managed by Python itself and can be re-used by all of these situations.

    I'm not saying that having a central solution is wrong. All I'm
    saying is that the implementations on this ticket are not within
    the scope of the ticket and instead need a proper PEP to see where the
    real use cases are and whether this particular way of doing things
    is a way we all want to go.

    We now have four ways of describing ABI flags in Python (well, actually
    even more, since Linux, Windows and OX S use different approaches for
    the platform ABI .so flags). This can't possibly be a good approach.

    I can already see all the different OS vendors creating
    their own little platform triplet extensions. In the end, it's rather
    likely that an extension compiled with eg. openSUSE won't run on Fedora or
    Debian anymore and vice-versa; and one compiled with vanilla Python
    probably won't run on Apples' Python anymore for similar reasons.
    Not a good perspective. This is going to make distributing binaries
    harder, not easier.

    @malemburg
    Copy link
    Member

    On 16.04.2015 20:21, Ned Deily wrote:

    Ned Deily added the comment:

    > No, PEP-3149 is about the Python ABI, following PEP-3147,
    > which implements this for PYC files.

    > The intent is to be able to have mutliple *Python* ABI/API versions
    > installed side-by-side, not multiple platform ABI versions :-)

    Well, for all practical purposes, the platform *is* part of the ABI :=)

    Yes, but if all your files on your box share the same ABI, do you
    really want to have all of them come with an extra name extension ?
    I mean: If all your apples are green, would you write "green" on them to
    remember ? ;-)

    All Linux distributions I know place the 32-bit and 64-bit versions
    of shared libs into different directories rather than putting them all
    into a single dir and adding ABI flags to the .so files.
    Windows does this too. FreeBSD as well.

    Why should Python behave differently ? Just because we can is not
    really a good answer, IMO.

    @dstufft
    Copy link
    Member

    dstufft commented Apr 16, 2015

    Whatever you do, you're still going to force all your main users to
    download things they don't need, so I don't see the argument of
    optimizing downloads or caches.

    pip caches downloads by default, many systems are starting to utilize that
    cache in order to stop repeat downloads of the same file. This would make it
    so that if you had a shared pip cache amongst many archiectures or platforms
    (which people are starting to do, especially with sharing caches with virtual
    boxes running on their own machines or services like Travis-CI) you'd only have
    to download that file from PyPI once ever.

    Looking at a few of the top projects on PyPI in terms of download count we
    have:

    Of those, only really lxml is large enough that adding a second or third or
    fourth copy of the .so is a really meaningful increase in size and since we
    wouldn't be making a "fat wheel" mandatory lxml could just decide not to build
    one. As far as I can tell we don't actually optimize for maximizing the amount
    downloading (otherwise we'd use something better than a .zip file).

    Since you need special support for such ZIP files (either using dlopen
    hacks or temporarily extracting them), you might as well deal with
    the platform dependencies in that handler. No need to force the
    platform tags on all your .so file for no apparent reason.

    There are other reasons as have already been mentioned, this is just yet
    another reason (and on it's own I'd agree it's not a sufficiently compelling
    use case), but when I see a pattern of things which all need the same thing
    then that speaks to me that it should live someplace centrally instead of
    having each one reimplement it.

    I'm not saying that having a central solution is wrong. All I'm
    saying is that the implementations on this ticket are not within
    the scope of the ticket and instead need a proper PEP to see where the
    real use cases are and whether this particular way of doing things
    is a way we all want to go.

    I don't care if it gets added as part of this ticket, another ticket, or as
    a PEP. I'm just listing where it'd be useful for the kinds of things I do.

    @malemburg
    Copy link
    Member

    On 17.04.2015 00:51, Donald Stufft wrote:

    > Since you need special support for such ZIP files (either using dlopen
    > hacks or temporarily extracting them), you might as well deal with
    > the platform dependencies in that handler. No need to force the
    > platform tags on all your .so file for no apparent reason.

    There are other reasons as have already been mentioned, this is just yet
    another reason (and on it's own I'd agree it's not a sufficiently compelling
    use case), but when I see a pattern of things which all need the same thing
    then that speaks to me that it should live someplace centrally instead of
    having each one reimplement it.

    Sure, but whatever the central implementation is going to be,
    it doesn't necessarily have to require sticking platform ABI flags
    on all .so files, even those which will never need to be installed
    side-by-side. The more paths you need to stat when searching
    a shared mod, the slower Python will get.

    There's a very simple trick which some packages used in the
    past for sumo distributions - you simply modify the __path__
    attribute of the package to point to the platform dependent
    files in the __init__.py file and Python will then automagically
    use the right C extensions.

    To simplify this, the platform triplets and other platform ABI flags
    could be made available via the sys or sysconfig module for importers
    and other tools to pick up.

    @doko42
    Copy link
    Member

    doko42 commented Apr 19, 2015

    I'm not trying to discredit any use cases, I just don't see them.

    so why do you see this on x86 for 32/64bit, but not for ARM soft-float/hard-float. The example given was pretty clear.

    All Linux distributions I know place the 32-bit and 64-bit versions
    of shared libs into different directories rather than putting them all
    into a single dir and adding ABI flags to the .so files.

    Well, then at least you don't know Debian, Ubuntu, and any of their derivates.

    And even the Python default install on Linux installs into the same place.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 19, 2015

    New changeset 558335559383 by doko in branch 'default':

    @ned-deily
    Copy link
    Member

    I think we should add something to the 3.5 "What's New" document about these changes and which platforms are affected. Otherwise is there anything left to do before closing?

    @larryhastings
    Copy link
    Contributor

    I sure hope not.

    @larryhastings
    Copy link
    Contributor

    I'm leaving this open just because we're apparently waiting on some "What's New" docs.

    @larryhastings
    Copy link
    Contributor

    Here's an attempt at a What's New section for this change. I expect it's wrong! Maybe someone can fix it. Maybe it's actually better than not having one at all.

    Can we maybe get a round or two of edits on this and get something in for 3.5 final?

    @berkerpeksag
    Copy link
    Member

    Adding Yury since he and Elvis are working on Doc/whatsnew/3.5.rst and they might want to take a look at the latest patch.

    @zooba
    Copy link
    Member

    zooba commented Sep 9, 2015

    Only thing I'd add is that the extra tag is optional (on Windows at least), and Python will happily import extensions without it. But extensions with a mismatched tag won't be loaded.

    @doko42
    Copy link
    Member

    doko42 commented Sep 9, 2015

    thanks for the draft!

    I'm not sure how to describe this properly. The extension names are derived from https://wiki.debian.org/Multiarch/Tuples

    and this again is derived from the GNU triplets/quadruplets.

    there is no "cpu" and "os" part, depending on the architecture some ABI parts are either encoded in the "cpu" part or the "os" part.

    So what about just enumerating the most common cases (i386-linux-gnu, x86_64-linux-gnu, arm-linux-gnueabi (still used for the old Raspberry Pi), arm-linux-gnueabihf), and then point to the "spec"? The above examples have some irregular cases, most other cases just follow the triplets.

    I wouldn't mention x86_64-linux-gnux32 explicitly. Until now there are only unreleased or experimental distros.

    Not sure if it is worth mentioning that this would allow distributing "fat" wheels.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 10, 2015

    New changeset 1744d65705d0 by Yury Selivanov in branch '3.5':
    whatsnew/3.5: Describe changes in issue bpo-22980
    https://hg.python.org/cpython/rev/1744d65705d0

    New changeset cfbcb3a6a848 by Yury Selivanov in branch 'default':
    Merge 3.5 (issue bpo-22980, whatsnew/3.5)
    https://hg.python.org/cpython/rev/cfbcb3a6a848

    @1st1
    Copy link
    Member

    1st1 commented Sep 10, 2015

    Larry, Matthias, Steve, Berker - I've mentioned this issue in the whatsnew (applied Larry's patch with some modifications to address comments from Steve and Matthias). Please review.

    @zooba
    Copy link
    Member

    zooba commented Sep 11, 2015

    There's no dot before the debug marker on Windows. Otherwise, looks good to me. Thanks for writing this up.

    @larryhastings
    Copy link
    Contributor

    It's fixed! So it's finally closed.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 14, 2017

    New changeset 80fc40a9ae47 by Martin Panter in branch '3.5':
    Issue bpo-22980: Skip a sysconfig test if _ctypes is not available.
    https://hg.python.org/cpython/rev/80fc40a9ae47

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    deferred-blocker type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests