This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Better platform.processor support
Type: Stage: resolved
Components: Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jaraco Nosy List: gregory.p.smith, jaraco, lemburg, miss-islington, pitrou
Priority: normal Keywords: patch

Created on 2019-02-11 14:29 by jaraco, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12230 closed jaraco, 2019-03-08 02:35
PR 12239 merged jaraco, 2019-03-08 18:03
PR 12824 merged jaraco, 2019-04-14 01:18
PR 19544 merged jaraco, 2020-04-15 19:07
PR 19577 merged yan12125, 2020-04-18 01:18
Messages (33)
msg335220 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-02-11 14:29
or: Unable to implement 'uname' on Python due to recursive call
or: platform.uname() should avoid calling `uname` in a subprocess


In [this issue](https://github.com/jaraco/cmdix/issues/1), I stumbled across a strange and somewhat unintuitive behavior. This project attempts to supply a `uname` executable implemented in Python, so that such functionality could be exposed on any platform including Windows.

What I found, however, was that because the stdlib `platform` module actually invokes `sh -c uname -p` during most any call of the module (https://github.com/python/cpython/blob/9db56fb8faaa3cd66e7fe82740a4ae4d786bb27f/Lib/platform.py#L836), attempting to use that functionality on a system where `uname` is implemented by Python (and on the path), will probably fail after a long delay due to infinite recursion.

Moreover, the _only_ call that's currently invoking `uname` in a subprocess is the `processor` resolution, which I suspect is rarely used, in part because the results from it are inconsistent and not particularly useful.

For example, on Windows, you get a detailed description from the hardware: 'Intel64 Family 6 Model 142 Stepping 9, GenuineIntel'

On macOS, you get just 'i386'.

And on Linux, I see 'x86_64' or sometimes just '' (in docker).

To make matters even worse, this `uname -p` call happens unconditionally on non-Windows systems for nearly any call in platform. As a result, it's impossible to suppress the invocation of `uname`, especially when functionality like `pkg_resources` and its environment markers is invoked early.

I suggest instead the platform module should (a) resolve processor information in a more uniform manner and (b) not ever call uname, maybe [with something like this](https://github.com/jaraco/cmdix/blob/d61e6d3b40032a25feff0a9fb2a79afaa7dcd4e0/cmdix/command/uname.py#L53-L77).

At the very least, the `uname` call should be late-bound so that it's not invoked unconditionally for rarely-used information.

After some period for comment, I'll draft an implementation.
msg335372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-02-12 21:55
Your proposal sounds fine to me.  You could fall back on platform.machine() instead of calling `uname` explicitly.
msg337454 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-08 08:32
As the documentation says, the API is intended as fairly portable implementation of the Unix uname helper across platforms. It's fine to redirect this directly to e.g. /proc output instead of using the executable, but in whatever you do here, the output of platform.uname() needs to stay compatible to what the function returned prior to such a change, which usually means: to the output of the uname helper on a system.

Could you please check that on most systems, the output remains the same ?

Thanks.
msg337475 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 13:58
It won't be possible in general to emit what the function returned before, as `uname` is a symbolic reference to an arbitrary executable, which can vary by platform and release and local environment.

What I might be able to do is find the implementation of "uname" and see if there's a way to get the value from the same source. I did find what I believe is the [canonical source](https://github.com/coreutils/coreutils/blob/66e2daa689fefec9ed201a04696b9f52d049d89a/src/uname.c#L301-L343).

I'll explore if those calls can be translated to Python.
msg337480 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 14:22
The first call I see in that routine is to "sysinfo", but the signature of that function doesn't match what I find in the [man pages for that function](http://man7.org/linux/man-pages/man2/sysinfo.2.html). So that function must be coming from elsewhere.
msg337481 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-08 14:22
Thanks. It would be good to do some before/after tests on popular
platforms, e.g. a few Linuxes, MacOS, Windows.
msg337483 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 14:30
Aha! It seems the 'sysinfo' call is for Solaris: https://docs.oracle.com/cd/E23823_01/html/816-5167/sysinfo-2.html
msg337485 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 14:38
Best I can tell, neither sysinfo nor sysctl are exposed in any way to Python, so it may not be possible to accurately load the processor information from those system calls without writing a wrapper in C. What I might try is to experiment with ctypes to see if I can prove the concept.
msg337487 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 14:55
Reading further, the 'sysctl' call seems to only be for BSD (https://www.freebsd.org/cgi/man.cgi?sysctl(3)). I could find the man page for sysctl for BSD but not Linux. There is a _sysctl in Linux (http://man7.org/linux/man-pages/man2/sysctl.2.html), but it's use is discouraged and it doesn't provide the necessary information.

Now I suspect that the aforementioned GNU coreutils 'uname' implementation is only for non-Linux systems, as none of the underlying system calls are relevant on Linux. I expect if one compiled that uname on Linux, 'uname -p' would emit 'unknown'.

Meaning I still don't know how to get a 'uname -p' result on Linux (without invoking uname -p).
msg337488 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 14:59
Hmm. But if I go to the Linux man page for uname (https://linux.die.net/man/1/uname) and follow the links to the source code, I end up at the same repository. So maybe the BSD man page is suitable for Linux. I'll work from that assumption for now.
msg337499 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 16:07
After fussing with sysctl for a while, I'm fairly confident that one can't use sysctl on Linux reliably (https://stackoverflow.com/a/55066774/70170). I'll keep digging to see if I can find another implementation of `uname` that's used on Linux.
msg337500 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 16:12
[This answer](https://unix.stackexchange.com/a/307960/275034) is extremely helpful. `uname -p` isn't available on Linux except Fedora and late versions of Debian that apply the patch.

This lack of consistency means that `platform.uname().processor` and thus `platform.processor()` is an inherently unreliable interface.
msg337503 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 16:14
Correction on last comment: s/Debian/Ubuntu/
msg337509 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-08 16:34
Jason: StackExchange does have lots of good hints, but it's not always
the correct. In this case, it's clearly wrong. uname -p has been
available on many Unix installations for decades.

I started writing the module back in 1999 and even then, the support
was already working on the systems I used at the time, and several
others, as you can see from this page:

https://www.egenix.com/www2002/python/mxCGIPython.html

The module was originally created to come up with a good name to
use for identifying platform binaries coming out of my mxCGIPython
project.

Note that the processor is not always needed to determine whether
software runs on a machine or not. The "uname -m" output often
is enough, but there are cases where e.g. compiler options are
used which produces code that only works on particular processors.

Perhaps adding a more capable API to interface to /proc/cpuinfo
would be a good idea.
msg337511 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 16:45
> the output of platform.uname() needs to stay compatible to what the function returned prior

Do we really wish to retain the output for this unreliable interface, especially when it is not standardized and is returning improper information? Is it valuable for `platform.processor()` to return "i386" (a 34-year-old processor) for my 2017 Macbook Pro?

Does maintaining compatibility for `platform.uname()` also imply that `platform.processor()` needs to return `platform.uname().processor`, or could the interface on the latter change, to provide a more useful value, while retaining the behavior of `platform.uname()`?

My instinct is it's impractical to attempt to maintain all of these forks of "uname -p", especially when the result is a largely unpredictable value, so I'm considering the only other viable option I can conceive now:

 - retain the subprocess call to "uname", but bind it late, as a functools.cached_property, such that "uname -p" is only ever called when the processor property is requested. This approach would also require overriding __iter__ and __getitem__ to retain the namedtuple interface while having that element resolved late.

I was also considering this: instead of invoking "uname" anywhere on the path, invoke it from an explicit whitelist of paths, such as /bin and /usr/bin, so that it's never self-referential. Unfortunately, that wouldn't work if a Python-based implementation were put on one of those paths, so it would be brittle at best.

Marc-Andre, I'd love your feedback in light of these challenges.
msg337513 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 17:00
> Perhaps adding a more capable API to interface to /proc/cpuinfo
would be a good idea.

The core concern I want to address is that it's not possible to use any function in the platform module without invoking "uname -p", and thus it's not possible to implement "uname" in Python. No amount of supplementary interfaces will help with that.
msg337517 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-08 17:26
On 08.03.2019 18:00, Jason R. Coombs wrote:
> 
>> Perhaps adding a more capable API to interface to /proc/cpuinfo
> would be a good idea.
> 
> The core concern I want to address is that it's not possible to use any function in the platform module without invoking "uname -p", and thus it's not possible to implement "uname" in Python. No amount of supplementary interfaces will help with that.

I don't know where you get that idea from. The uname family of APIs
do use "uname -p" on platforms where this exists, but the other
ones don't.

It's also easy to bypass that by simply seeding the global cache
for uname(): _uname_cache. Or you could call your utility
something else. Or you could monkey-patch the platform module
in your utility to work around the circular reference.

To be clear: I do not consider your use case to be particularly common
enough to warrant changes to the module, but would welcome additions
which bring more or better functionality to the module, e.g. having
the processor variable return meaningful where it previously did
not (ie. uname() return '' for the processor entry), or adding
another API to provide more detailed information.
msg337518 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 17:36
In [this commit](https://github.com/jaraco/cpython/commit/acd024e2d4aa56f13d7bc165d10a35510e83a12b), I demonstrate the alternative approach I was considering that avoids calling "uname -p" until it's required, but otherwise retains compatibility by using the same logic for resolving the processor on the various platforms.
msg337519 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-03-08 17:50
> It's also easy to bypass that by simply seeding the global cache
> for uname(): _uname_cache. 
> Or you could monkey-patch the platform module
> in your utility to work around the circular reference.

I don't think these options are possible in the general case. It was what I attempted to do in the first place, but could not. Consider the situation where a namespace package is present or where a script uses pkg_resources to bootstrap itself (a very common case), or any other case where `platform.(anything)` is invoked before the "bypass" or "monkey-patch" has a chance to run. This happens when running the test suite for `cmdix` because pytest invokes pkg_resources to search for entry points and that code invokes `platform.system` (or similar) to evaluate environment markers long before the cmdix code has been imported.

Here's what happens:

`platform.(anything)` runs `platform.uname` and `platform.uname` invokes `uname -p` in a subprocess _unconditionally_. Python doesn't provide hooks to monkey-patch that out before it gets invoked.

> Or you could call your utility something else.

The point of this utility is to supply "coreutils" using Python. It's derived from an abandoned project called "pycoreutils", one purpose of which is to provide the core utilities on a minimal Linux distribution that doesn't have uname. Another is to supply coreutils on Windows. Having an alternate name isn't really viable when the purpose is to supply that interface.


I do think your considerations are reasonable, and I'm close to giving up. I look forward to your feedback on the 'resolved-late' branch.
msg338081 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2019-03-16 12:34
On 08.03.2019 18:50, Jason R. Coombs wrote:
> 
>> It's also easy to bypass that by simply seeding the global cache
>> for uname(): _uname_cache. 
>> Or you could monkey-patch the platform module
>> in your utility to work around the circular reference.
> 
> I don't think these options are possible in the general case. It was what I attempted to do in the first place, but could not. Consider the situation where a namespace package is present or where a script uses pkg_resources to bootstrap itself (a very common case), or any other case where `platform.(anything)` is invoked before the "bypass" or "monkey-patch" has a chance to run. This happens when running the test suite for `cmdix` because pytest invokes pkg_resources to search for entry points and that code invokes `platform.system` (or similar) to evaluate environment markers long before the cmdix code has been imported.

I don't quite follow: since you are the author of the tool, you can of
course have your uname.py import platform and then apply one of the
above tricks, e.g.

"""
#!/usr/bin/env python3
import platform

# Seed uname cache to avoid calling uname
platform._uname_cache = platform.uname_result(
    system='Linux',
    node='moon',
    release='5.99.99',
    version='#1 SMP 2020',
    machine='x86_64',
    processor='x86_64')

print ('Hello from uname.py')
print ('platform.uname() = %r' % (platform.uname(),))
"""

> Here's what happens:
> 
> `platform.(anything)` runs `platform.uname` and `platform.uname` invokes `uname -p` in a subprocess _unconditionally_. Python doesn't provide hooks to monkey-patch that out before it gets invoked.

This is only true for the platform APIs which need information from
uname. Not in general.

>> Or you could call your utility something else.
> 
> The point of this utility is to supply "coreutils" using Python. It's derived from an abandoned project called "pycoreutils", one purpose of which is to provide the core utilities on a minimal Linux distribution that doesn't have uname. Another is to supply coreutils on Windows. Having an alternate name isn't really viable when the purpose is to supply that interface.
> 
> 
> I do think your considerations are reasonable, and I'm close to giving up. I look forward to your feedback on the 'resolved-late' branch.

I don't have anything against making calling of uname lazy.
I also don't have anything against return useful information
rather than "unknown".

Your PR is missing tests, though, to support that it actually
returns the same values are before for a set of common platforms.
msg340176 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-04-14 00:46
> I don't quite follow: since you are the author of the tool, you can of
course have your uname.py import platform and then apply one of the
above tricks.

I thought I'd tried that, but failed [ref](https://github.com/jaraco/cmdix/issues/1#issuecomment-462207845), which is why I committed [this change](https://github.com/jaraco/cmdix/commit/c53908b4b39771eed9f64fff5bed8af51baae4d0).

The problem is that, if `pkg_resources` is used to implement the entry point for the `uname` console script, or if any other library happens to call platform.*, such as in site.py, before the patch has been allowed to run, the invocation of `uname` itself ends up invoking `uname`, causing unlimited recursion. No amount of patching in the `uname` command implementation can help that.

> Your PR is missing tests, though, to support that it actually
returns the same values are before for a set of common platforms.

Yes, that sounds like a good plan. I'll add some tests that assert the values and then update the tests to match the current output, establish a baseline.
msg340209 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-04-14 14:05
In PR 12824 (https://github.com/python/cpython/pull/12824), I've developed a test that should assure the current output from uname().processor.

I've merged those changes with PR 12239, which if the tests pass, should illustrate the values returned are unchanged.

@lemburg, would you be willing to review these PRs to confirm they capture and address your concern?
msg366537 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-04-15 18:32
New changeset 4b4e90a51848578dc06341777a929a0be4f4757f by Jason R. Coombs in branch 'master':
bpo-35967: Baseline values for uname -p (GH-12824)
https://github.com/python/cpython/commit/4b4e90a51848578dc06341777a929a0be4f4757f
msg366542 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-04-15 19:05
The aformentioned test broke tests in buildbots: https://buildbot.python.org/all/#builders/105/builds/779
msg366543 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-04-15 19:28
raspbian failure https://buildbot.python.org/all/#/builders/645/builds/31
msg366551 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-04-15 20:28
I'm hoping that PR 19544 fixes the issue.
msg366572 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-04-15 23:55
New changeset e72cbcb346cfcc1ed7741ed6baf1929764e1ee74 by Jason R. Coombs in branch 'master':
bpo-35967: Make test_platform.test_uname_processor more lenient to satisfy build bots. (GH-19544)
https://github.com/python/cpython/commit/e72cbcb346cfcc1ed7741ed6baf1929764e1ee74
msg366594 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-04-16 12:28
New changeset 518835f3354d6672e61c9f52348c1e4a2533ea00 by Jason R. Coombs in branch 'master':
bpo-35967 resolve platform.processor late (GH-12239)
https://github.com/python/cpython/commit/518835f3354d6672e61c9f52348c1e4a2533ea00
msg366718 - (view) Author: miss-islington (miss-islington) Date: 2020-04-18 14:20
New changeset fb940408cea1fb34fed1418832f240f886dadf57 by Chih-Hsuan Yen in branch 'master':
bpo-35967: Skip test with `uname -p` on Android (GH-19577)
https://github.com/python/cpython/commit/fb940408cea1fb34fed1418832f240f886dadf57
msg368765 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-05-13 07:50
Reopening the ticket, since the implementation makes backwards incompatible changes to platform.uname(): see https://bugs.python.org/issue40570 for a discussion on a better approach to lazy evaluation of getting the processor information.

Before we head on into implementation details, could you please point me to the motivation why only the processor detail of uname() needs lazy evaluation ?

Thanks.
msg368803 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-05-13 22:28
My bad. I probably could have been more proactive about providing a reproducer. The problem, as described above (msg335220) and in the associated cmdix ticket, is that invocation of `platform.(anything)` causes shelling out to execute "uname", so it's not possible to implement uname on Python unless one can guarantee that `platform.(anything)` is not invoked prior to the Python uname implementation executing.

Here's a Dockerfile replicating the issue:

```
FROM ubuntu:focal

# Install Python
RUN apt update
RUN apt install -y python3
RUN ln -s /usr/bin/python3 /usr/bin/python

# Simulate something on the system that invokes platform early.
RUN printf 'print(__import__("platform").system())' > sitecustomize.py
ENV PYTHONPATH=/

# Create a stubbed 'uname' command build in Python
ENV PATH=/:$PATH
RUN printf '#!/usr/bin/env python\nprint("getting ready to patch platform", flush=True)' > uname
RUN chmod u+x uname

CMD uname
```

As you can see, this reproducer creates a _very_ simple 'uname' implementation. All it does is print that it's about to patch the platform module (because maybe that will make things work). Unfortunately, that behavior is never reached because before that code has a chance to run, `sitecustomize` is imported and calls `platform.system()`, which invokes `platform.uname()` which attempts to resolve the processor, which attempts to invoke `uname -p` (even on Windows), which invokes the stubbed uname command, and infinite recursion begins.

The `sitecustomize` might seem a little contrived, except that a very similar behavior occurs in a very typical environment:

- pip, when installing a package for editing, invokes setuptools to `develop` the package.
- setuptools, when installing a package for developing, creates command-line entry points using a routine that imports `pkg_resources` to ensure that the relevant packages are present before invoking the command's functionality.
- pkg_resources imports packaging to evaluate markers.
- packaging uses `platform.system()` and other behaviors to evaluate the markers.

So ultimately, the same behavior is triggered before the user's code is ever executed.

But more importantly, why should "uname -p" be invoked in a subprocess on Windows to get "platform.system()"?
msg369200 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2020-05-18 09:47
Thanks, Jason. I'll have a closer look at the issue and report back later this week.
msg415633 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2022-03-20 21:10
I'm going to close this issue again, as the implementation is now present in at least a couple of releases. May I suggest that if there are ongoing concerns or issues to open up a new issue and reference this one and loop me into the conversation? I'm also happy to re-open this one as well.
History
Date User Action Args
2022-04-11 14:59:11adminsetgithub: 80148
2022-03-20 23:42:53yan12125setnosy: - yan12125
2022-03-20 21:10:53jaracosetstatus: open -> closed
resolution: fixed
messages: + msg415633
2020-05-18 09:47:57lemburgsetmessages: + msg369200
2020-05-13 22:28:14jaracosetmessages: + msg368803
2020-05-13 07:50:20lemburgsetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg368765
2020-04-18 14:20:57miss-islingtonsetnosy: + miss-islington
messages: + msg366718
2020-04-18 01:18:30yan12125setnosy: + yan12125

pull_requests: + pull_request18917
2020-04-16 12:29:40jaracosetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-04-16 12:28:16jaracosetmessages: + msg366594
2020-04-15 23:55:39jaracosetmessages: + msg366572
2020-04-15 20:28:31jaracosetmessages: + msg366551
2020-04-15 19:28:50gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg366543
2020-04-15 19:07:25jaracosetpull_requests: + pull_request18891
2020-04-15 19:05:49jaracosetmessages: + msg366542
2020-04-15 18:47:56jaracosetversions: + Python 3.9, - Python 3.8
2020-04-15 18:32:10jaracosetmessages: + msg366537
2019-04-14 14:05:22jaracosetmessages: + msg340209
2019-04-14 01:18:24jaracosetpull_requests: + pull_request12749
2019-04-14 00:46:10jaracosetmessages: + msg340176
2019-03-16 12:34:18lemburgsetmessages: + msg338081
2019-03-08 18:03:29jaracosetpull_requests: + pull_request12227
2019-03-08 17:50:41jaracosetmessages: + msg337519
2019-03-08 17:36:14jaracosetmessages: + msg337518
2019-03-08 17:26:09lemburgsetmessages: + msg337517
2019-03-08 17:00:18jaracosetmessages: + msg337513
2019-03-08 16:45:11jaracosetmessages: + msg337511
2019-03-08 16:34:32lemburgsetmessages: + msg337509
2019-03-08 16:14:25jaracosetmessages: + msg337503
2019-03-08 16:12:32jaracosetmessages: + msg337500
2019-03-08 16:07:53jaracosetmessages: + msg337499
2019-03-08 14:59:05jaracosetmessages: + msg337488
2019-03-08 14:55:43jaracosetmessages: + msg337487
2019-03-08 14:38:56jaracosetmessages: + msg337485
2019-03-08 14:30:24jaracosetmessages: + msg337483
2019-03-08 14:22:08lemburgsetmessages: + msg337481
2019-03-08 14:22:03jaracosetmessages: + msg337480
2019-03-08 13:58:15jaracosetmessages: + msg337475
2019-03-08 08:32:23lemburgsetmessages: + msg337454
2019-03-08 02:35:50jaracosetkeywords: + patch
stage: patch review
pull_requests: + pull_request12217
2019-03-08 00:25:56jaracosetassignee: jaraco
versions: + Python 3.8
2019-02-12 21:55:24pitrousetnosy: + pitrou
messages: + msg335372
2019-02-11 18:45:10ned.deilysetnosy: + lemburg
2019-02-11 14:29:23jaracocreate