Issue35348
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-11-29 12:33 by serhiy.storchaka, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 11159 | merged | vstinner, 2018-12-14 12:29 | |
PR 11160 | closed | serhiy.storchaka, 2018-12-14 14:55 | |
PR 11186 | closed | vstinner, 2018-12-17 09:01 | |
PR 11208 | closed | vstinner, 2018-12-18 11:06 |
Messages (24) | |||
---|---|---|---|
msg330686 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-11-29 12:33 | |
The code of _syscmd_file() in the platform module does not match the docstring. The "-b" option was removed in 685fffa (issue16112), and this leads to inclusion the executable path in the file command output. If the executable path contains some key strings like "32-bit" or "PE", platform.architecture() can return an incorrect result. I think that the "-b" option should be restored. $ python3 -c 'import platform; print(platform.architecture("/usr/bin/python3.6"))' ('64bit', 'ELF') $ cp /usr/bin/python3.6 /tmp/32-bitPE $ python3 -c 'import platform; print(platform.architecture("/tmp/32-bitPE"))' ('32bit', 'ELF') Other problem is that the code tests if the string "executable" is contained in the file command output. But it is not always contained for executables on Linux. $ file python python: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d3cfa06c2bdcbf7b6af9e4e6be5061cb8398c086, with debug_info, not stripped $ file /usr/bin/python2.7 /usr/bin/python2.7: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=fd3904306c73383fb371287416257b82d6a3363b, stripped $ file /usr/bin/python3.6 /usr/bin/python3.6: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9dae0eec9b3f9cb82612d20dc0c3088feab9e356, stripped |
|||
msg330689 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-11-29 14:27 | |
Why does platform has to analyze sys.executable binary to check if it's 32 or 64 bits? Can't we use sizeof(void*) for example? Is it something related to FAT binary on macOS? (single binary for 32 and 64 bits, or single binary for PPC and x86) |
|||
msg330738 - (view) | Author: Windson Yang (Windson Yang) * | Date: 2018-11-30 00:10 | |
I agreed with Serhiy. I also found the function decode the output with latin-1, I think it will be better to use utf-8 instead. |
|||
msg331816 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 12:18 | |
I removed the dependency between bpo-35346 and this issue. I don't see how they are related. |
|||
msg331817 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 12:20 | |
> I agreed with Serhiy. I also found the function decode the output with latin-1, I think it will be better to use utf-8 instead. Decoding from UTF-8 can fail with UnicodeDecodeError, whereas decoding from latin-1 never fails. file output is ASCII, so I don't see the point of using UTF-8. Currently, the command displays the filename, but I don't think that we should care of the encoding of the filename. > I think that the "-b" option should be restored. That, or strip/skip the filename in the output? |
|||
msg331820 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 12:30 | |
-b option added by: https://hg.python.org/cpython/rev/c73b90b6dadd and removed the day after by: https://hg.python.org/cpython/rev/b94a9ff13199 |
|||
msg331821 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 12:33 | |
> -b option added by: > https://hg.python.org/cpython/rev/c73b90b6dadd Oh wait, this change is for the 2.7 branch. The change in master (old "default" branch) didn't add -b, but replaced "-b" with "-b --": https://hg.python.org/cpython/rev/cd026866b333 |
|||
msg331826 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-14 13:30 | |
I tested that the "-b" option is supported on Linux, *BSD and OpenIndiana. But it is not a part of POSIX. So perhaps we should fall back to "file" without "-b" if "file -b" failed. We can also check that the output starts with executable+': ' and strip this prefix. |
|||
msg331829 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 14:21 | |
In 2.7 branch, _syscmd_file() only used -b option during one day (no Python 2.7.x release used -b): * Oct 4, 2012: commit 95038fa526c8b93e42c59b0735edf1c80b7b6449 added -b: "Closes #16112: platform.architecture does not correctly escape argument to /usr/bin/file" * Oct 5, 2012: commit 2699c9d24810196a07e0577215ec3beb7ecfb55b removed -b: "#16112: platform.architecture does not correctly escape argument to /usr/bin/file. Fix original patch" Python 3.2.0 (Feb 2011) to 3.2.3 (Sep 2012) and Python 3.3.0 (Sep 2012) used -b: * Aug 13, 2010: commit ddfb2c3a338a8c1550774648f816e96c41f59de1 added -b: "Omit the filename to avoid enconding issues, especially with non encodable characters in the Python full path." * Oct 5, 2012: commit 685fffa8f427b3a768b232170752565d58c32112 removed -b: "#16112: platform.architecture does not correctly escape argument to /usr/bin/file. Fix original patch" |
|||
msg331830 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 14:27 | |
> We can also check that the output starts with executable+': ' and strip this prefix. Technically, on UNIX, ':' is valid in a filename. Filename examples which contain ':' on my Fedora 29: /usr/share/man/man3/List::Util.3pm.gz /usr/share/usb_modeswitch/0408:f000 /proc/irq/127/ahci[0000:00:17.0] /proc/irq/131/snd_hda_intel:card0 /dev/block/259:3 /sys/kernel/slab/:0002632 /sys/module/psmouse/drivers/serio:psmouse Note: I cannot find a program name which contains ':'. |
|||
msg331831 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 14:28 | |
A convervative approach would be to leave stable branches unchanged and use -b in the master branch. |
|||
msg331834 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-14 15:04 | |
PR 11160 is an alternate solution which strips the filename in the output. It does not matter if the filename contains ":", because the format of the output in the POSIX locale is strictly specified. |
|||
msg331835 - (view) | Author: Ronald Oussoren (ronaldoussoren) * | Date: 2018-12-14 15:20 | |
BTW. A related problem with platform.architecture() is that it doesn't know how to deal with fat binaries (such as those found on macOS). As an example: $ file /usr/bin/python /usr/bin/python: Mach-O universal binary with 2 architectures: [i386:Mach-O executable i386] [x86_64:Mach-O 64-bit executable x86_64] /usr/bin/python (for architecture i386): Mach-O executable i386 /usr/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64 This will be reported as "64-bit" by platform.architecture() because there is '64-bit' in the output of file(1). Using sizeof(void*) or sys.maxsize suffers from the a simular problem: this will only detect the pointer-size of the current proces and not that the binary is capable of running with a different pointer-size as well. P.S. platform.architecture() uses file(1) because you can specify different executables than sys.executable. |
|||
msg331836 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-14 15:59 | |
What result of platform.architecture() do you expect for an universal binary? |
|||
msg331843 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 17:20 | |
I don't understand the purpose of the 'linkage' information of platform.architecture(). Does anyone care if Python is an ELF program or a WindowsPE program? Maybe it was useful 20 years ago when there were COFF on Unix, but right now ELF is the defacto standard on Unix, and WindowsPE on Windows. 32-bit and 64-bit information should be enough, no? I would suggest to just return ('%sbit' % bits, '') if executable is not set. Use struct.calcsize('P')*8 or sys.maxsize to get bits. |
|||
msg331956 - (view) | Author: Ronald Oussoren (ronaldoussoren) * | Date: 2018-12-17 09:06 | |
> What result of platform.architecture() do you expect for an universal binary? I honestly don't know. What is the purpose of this functionality in the first place? I have never had a problem where using this function was the right solution. To be honest I have the same problem with a number of other APIs in this module. As an example, platform.system_alias() looks interesting but has an API that won't work in general (macOS release version cannot be calculated from platform.uname() information, likewise for linux distribution information). |
|||
msg331958 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-17 09:13 | |
Ronald Oussoren gave more info on my previous PR 10780 ("platform.platform() uses mac_ver() on macOS"): https://github.com/python/cpython/pull/10780#issuecomment-444529371 """ The information does not include data about fat binaries, resulting amongst others in the following inconsistency: ronald@Menegroth[0]$ arch -i386 python3.6 -m platform Darwin-18.2.0-x86_64-i386-64bit ronald@Menegroth[0]$ arch -i386 python3.6 -c 'import sys; print(sys.maxsize)' 2147483647 This platform output includes "64bit" because the binary for python3.6 includes support for both i386 and x86_64, and doesn't show that the command is using i386 instructions. """ I made some tests: $ file /usr/local/bin/python3 /usr/local/bin/python3: Mach-O universal binary with 2 architectures: [i386:Mach-O executable i386] [x86_64:Mach-O 64-bit executable x86_64] /usr/local/bin/python3 (for architecture i386): Mach-O executable i386 /usr/local/bin/python3 (for architecture x86_64): Mach-O 64-bit executable x86_64 $ /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)' ('64bit', '') 8 9223372036854775807 $ arch -x86_64 /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)' ('64bit', '') 8 9223372036854775807 $ arch -i386 /usr/local/bin/python3 -c 'import struct, sys, platform; print(platform.architecture(), struct.calcsize("P"), sys.maxsize)' ('64bit', '') 4 2147483647 IMHO platform.architecture() should return 32bit when running "arch -i386 /usr/local/bin/python3" to be consistent with struct.calcsize("P") == 4 and sys.maxsize == 2147483647. Otherwise, how would you notice that you are using the 32-bit flavor of Python? My PR 11186 implements this fix. Ronald Oussoren: > Using sizeof(void*) or sys.maxsize suffers from the a simular problem: this will only detect the pointer-size of the current proces and not that the binary is capable of running with a different pointer-size as well. Right, but I don't think that it's possible to report that Python executable is FAT binary in platform.architecture() result. If you want to provide such information, IMHO you should write a new function or at least add a new parameter to platform.architecture(). IMHO it's more consistent to report "32bit" for "arch -i386 python3" and "64bit" for "arch -x86_64 python3". |
|||
msg331959 - (view) | Author: Ronald Oussoren (ronaldoussoren) * | Date: 2018-12-17 09:30 | |
> IMHO platform.architecture() should return 32bit when running "arch -i386 /usr/local/bin/python3" to be consistent with struct.calcsize("P") == 4 and sys.maxsize == 2147483647. Otherwise, how would you notice that you are using the 32-bit flavor of Python? I don't agree. Platform.architecture() is defined to look at a specified binary, not the currently running process. That can lead to inconsistencies like this and is not something you can avoid. > Ronald Oussoren: > > Using sizeof(void*) or sys.maxsize suffers from the a simular problem: this will only detect the pointer-size of the current proces and not that the binary is capable of running with a different pointer-size as well. > Right, but I don't think that it's possible to report that Python executable is FAT binary in platform.architecture() result. If you want to provide such information, IMHO you should write a new function or at least add a new parameter to platform.architecture(). > IMHO it's more consistent to report "32bit" for "arch -i386 python3" and "64bit" for "arch -x86_64 python3". This doesn't necessarily need a new function, platform.architecture could also return something like "32bit,64bit". But as I mentioned in my previous message I don't know why anyone would want to use this function in the first place. There are better ways to determine information about the current process (struct.calcsize, sys.maxsize, sys.byteorder), and I have never had a need to determine information about executable files that I couldn't get in a better way using other libraries (like macholib and pyelftools) |
|||
msg331962 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-17 09:36 | |
See also bpo-35516: "platform.system_alias(): add macOS support". |
|||
msg331965 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-17 10:15 | |
> I don't agree. Platform.architecture() is defined to look at a specified binary, not the currently running process. That can lead to inconsistencies like this and is not something you can avoid. architecture() looks at running Python executable by default and documents a special case when executable equals to sys.executable: "(...) then only if the executable points to the Python interpreter. Reasonable defaults are used when the above needs are not met." https://docs.python.org/dev/library/platform.html#platform.architecture > This doesn't necessarily need a new function, platform.architecture could also return something like "32bit,64bit". As an user, I don't need for this information. architecture() already contains a note: """ On Mac OS X (and perhaps other platforms), executable files may be universal files containing multiple architectures. To get at the “64-bitness” of the current interpreter, it is more reliable to query the sys.maxsize attribute: is_64bits = sys.maxsize > 2**32 """ > But as I mentioned in my previous message I don't know why anyone would want to use this function in the first place. There are better ways to determine information about the current process (struct.calcsize, sys.maxsize, sys.byteorder), and I have never had a need to determine information about executable files that I couldn't get in a better way using other libraries (like macholib and pyelftools) platform.architecture() has multiple issues: * It rely on the external program "file". It is not available on Windows. It is likely missing on small Linux containers. It doensn't report an error if the program is missing but "should be available". * Calling an external program can lead to security issues. * Parsing file output is not reliable, even if PR 11159 should make the parsing more reliable. For example, file output is locale dependent and the -b option is not standard. * The expected output on a macOS universal binary is unclear. * The purpose of the function seems to be unclear to most developers... Another solution is to deprecate the function. I agree with Ronald that sys.maxsize is enough for most use cases (get "bits"). For more accurate information, platform.architecture() is wrong and a third-party module is required. By the way, platform.architecture() is not used in the stdlib which is a sign that maybe the function is not really helpful. Moreover, sysconfig and distutils.util contain the following code: # We can't use "platform.architecture()[0]" because a # bootstrap problem. We use a dict to get an error # if some suspicious happens. bitness = {2147483647:"32bit", 9223372036854775807:"64bit"} machine += ".%s" % bitness[sys.maxsize] Serhiy, Ronald: What do you think of deprecating platform.architecture() instead of trying to fix it? |
|||
msg332015 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-17 17:47 | |
New changeset 0af9c33262fb43f39a6558e3f155689e83e10706 by Victor Stinner in branch 'master': bpo-35348: Fix platform.architecture() (GH-11159) https://github.com/python/cpython/commit/0af9c33262fb43f39a6558e3f155689e83e10706 |
|||
msg332016 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2018-12-17 18:45 | |
Guys, please read the doc-string of the platform.architecture() function (or ask the person who wrote most of the module). It clearly refers to inspecting a specific executable and only uses the Python interpreter as default. The running process can provide some sane defaults, but is not necessarily using the same values as the given executable. The function does not support multi-architecture executables. This is simply out of scope for the function. Victor: AFAIK, I still own this module, so if you want to deprecate something, please ping me first. |
|||
msg332050 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-18 11:12 | |
Ok, I closed my PR 11186 which modified architecture() to only return struct.calcsize('P') if the executable argument is equal to sys.executable. > please read the doc-string of the platform.architecture() function (or ask the person who wrote most of the module). It clearly refers to inspecting a specific executable and only uses the Python interpreter as default. The running process can provide some sane defaults, but is not necessarily using the same values as the given executable. I see the platform module as a module to get info about the operating system and Python, but it seems like I misunderstood the purpose of the specific case of the architecture() function. I propose a small addition to the doc to avoid confusion: https://github.com/python/cpython/pull/11208/files |
|||
msg332072 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-18 16:40 | |
The initial issue has been fixed, I close the issue. Thanks for the review and feedback! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:08 | admin | set | github: 79529 |
2018-12-18 16:40:49 | vstinner | set | status: open -> closed resolution: fixed messages: + msg332072 stage: patch review -> resolved |
2018-12-18 11:12:59 | vstinner | set | messages: + msg332050 |
2018-12-18 11:06:34 | vstinner | set | pull_requests: + pull_request10445 |
2018-12-17 18:45:51 | lemburg | set | messages: + msg332016 |
2018-12-17 17:47:28 | vstinner | set | messages: + msg332015 |
2018-12-17 10:15:35 | vstinner | set | messages: + msg331965 |
2018-12-17 09:36:26 | vstinner | set | messages: + msg331962 |
2018-12-17 09:30:17 | ronaldoussoren | set | messages: + msg331959 |
2018-12-17 09:13:54 | vstinner | set | messages: + msg331958 |
2018-12-17 09:06:47 | ronaldoussoren | set | messages: + msg331956 |
2018-12-17 09:01:41 | vstinner | set | pull_requests: + pull_request10424 |
2018-12-14 17:20:59 | vstinner | set | messages: + msg331843 |
2018-12-14 15:59:00 | serhiy.storchaka | set | messages: + msg331836 |
2018-12-14 15:20:37 | ronaldoussoren | set | nosy:
+ ronaldoussoren messages: + msg331835 |
2018-12-14 15:04:04 | serhiy.storchaka | set | messages: + msg331834 |
2018-12-14 14:55:24 | serhiy.storchaka | set | pull_requests: + pull_request10396 |
2018-12-14 14:28:01 | vstinner | set | messages: + msg331831 |
2018-12-14 14:27:31 | vstinner | set | messages: + msg331830 |
2018-12-14 14:21:25 | vstinner | set | messages: + msg331829 |
2018-12-14 13:30:57 | serhiy.storchaka | set | messages: + msg331826 |
2018-12-14 12:33:08 | vstinner | set | messages: + msg331821 |
2018-12-14 12:30:56 | vstinner | set | messages: + msg331820 |
2018-12-14 12:29:41 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request10395 |
2018-12-14 12:20:39 | vstinner | set | messages: + msg331817 |
2018-12-14 12:18:25 | vstinner | set | messages: + msg331816 |
2018-12-14 12:15:57 | vstinner | unlink | issue35346 dependencies |
2018-11-30 00:10:01 | Windson Yang | set | nosy:
+ Windson Yang messages: + msg330738 |
2018-11-29 14:27:12 | vstinner | set | messages: + msg330689 |
2018-11-29 12:34:03 | serhiy.storchaka | link | issue35346 dependencies |
2018-11-29 12:33:34 | serhiy.storchaka | create |