msg113549 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-10 17:21 |
Lib/platform.py was created 7 years ago by r32391. _syscmd_file() docstring was never changed whereas it is inconsistent with the implementation:
---
def _syscmd_file(target,default=''):
""" Interface to the system's file command.
The function uses the -b option of the file command to have it
ommit the filename in its output and if possible the -L option
to have the command follow symlinks. It returns default in
case the command should fail.
"""
...
target = _follow_symlinks(target).replace('"', '\\"')
...
f = os.popen('file "%s" 2> %s' % (target, DEV_NULL))
...
---
It doesn't use -L option but use Python to follow the link, and use an regex to remove the filename.
Attached patch enables -b option to avoid problem with non-ascii filenames but ascii locale encoding (see #8611 and #9425) and updates the docstring.
--
To fix the non-ascii problem, I tried a different approach by using subprocess API which gives a bytes version of stdout and so avoid the encoding issue. But I commited the patch on python trunk (2.7) which had a bootstrap issue. py3k had no bootstrap issue, but the new patch (use -b option) is simpler.
Commits: r80166 (trunk), r80167 (py3k). Reverted: r80171+r80189 (trunk) and r80190 (py3k).
More details in the following mail thread:
http://mail.python.org/pipermail/python-checkins/2010-April/092092.html
|
msg113553 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2010-08-10 17:56 |
STINNER Victor wrote:
>
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
>
> Lib/platform.py was created 7 years ago by r32391. _syscmd_file() docstring was never changed whereas it is inconsistent with the implementation:
> ---
> def _syscmd_file(target,default=''):
>
> """ Interface to the system's file command.
>
> The function uses the -b option of the file command to have it
> ommit the filename in its output and if possible the -L option
> to have the command follow symlinks. It returns default in
> case the command should fail.
>
> """
> ...
> target = _follow_symlinks(target).replace('"', '\\"')
> ...
> f = os.popen('file "%s" 2> %s' % (target, DEV_NULL))
> ...
> ---
>
> It doesn't use -L option but use Python to follow the link, and use an regex to remove the filename.
>
> Attached patch enables -b option to avoid problem with non-ascii filenames but ascii locale encoding (see #8611 and #9425) and updates the docstring.
The patch looks good. Just one nit: could you please indent the doc-string
to match the original indentation ?
BTW: I had a look in my archive for platform.py, but couldn't find
where the symlink logic was changed to use Python's APIs instead.
> --
>
> To fix the non-ascii problem, I tried a different approach by using subprocess API which gives a bytes version of stdout and so avoid the encoding issue. But I commited the patch on python trunk (2.7) which had a bootstrap issue. py3k had no bootstrap issue, but the new patch (use -b option) is simpler.
>
> Commits: r80166 (trunk), r80167 (py3k). Reverted: r80171+r80189 (trunk) and r80190 (py3k).
>
> More details in the following mail thread:
> http://mail.python.org/pipermail/python-checkins/2010-April/092092.html
The main reason why this doesn't work on trunk is that platform.py is
supposed to work for many different Python versions, since it's
original use case was to provide platform information to build systems
targeting multiple Python versions.
On Python3, the same is true, but only for the 3.x versions.
|
msg113727 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 00:50 |
> The patch looks good.
Can it/should it be applied to 2.7 too?
> Just one nit: could you please indent the doc-string
> to match the original indentation ?
Done. New patch attached. Is it ok like that?
|
msg113728 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 00:53 |
Hey! I don't know why, but I posted a truncated patch. It doesn't remove the code removing the filename and so it breaks the code. New try: version 3 should be ok :-)
|
msg113729 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-08-13 00:53 |
Is it guaranteed that the -b option will be present in every version of file?
|
msg113741 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-08-13 02:04 |
Guaranteed? No. -b is not required by posix/SUS. I bet it exists everywhere we care about, though. (cf. http://en.wikipedia.org/wiki/File_(command))
|
msg113755 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2010-08-13 10:35 |
Éric Araujo wrote:
>
> Éric Araujo <merwok@netwok.org> added the comment:
>
> Is it guaranteed that the -b option will be present in every version of file?
Looking at the only use case of _syscmd_file(), it may not even be worth
the trouble of adding the -b option. Fixing the doc-string may be enough:
# Get data from the 'file' system command
if executable:
output = _syscmd_file(executable, '')
else:
output = ''
...
# Split the output into a list of strings omitting the filename
fileout = _architecture_split(output)[1:]
Note how architecture() already chops off the filename, so it
relies on the existing behavior, rather than the one described
in the doc-string of _syscmd_file().
Sorry, Victor, for not spotting this earlier on.
|
msg113769 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 13:32 |
> Looking at the only use case of _syscmd_file(), it may not even
> be worth the trouble of adding the -b option. Fixing the doc-string
> may be enough: (...)
Well, my problem is that _syscmd_file() fails with a non encodable filename on Linux because the file program writes the filename to stdout, but os.popen() is unable to parse it because the filename is not decodable from the filesystem encoding (without surrogateescape).
I would like to add -b option just to avoid this encoding problem.
The bug:
---
$ cp /bin/ls $(echo -ne "ls-\x80")
$ ./python -c "import platform; platform.architecture(executable='ls-\uDC80')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1059, in architecture
output = _syscmd_file(executable, '')
File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1006, in _syscmd_file
output = f.read().strip()
File "/home/SHARE/SVN/py3k/Lib/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 24: invalid start byte
---
See also #8611 and #9425.
|
msg113770 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 13:32 |
@lemburg: Your mail client likes to change the issues' title by adding some spaces :-)
|
msg113775 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2010-08-13 14:04 |
STINNER Victor wrote:
>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
>> Looking at the only use case of _syscmd_file(), it may not even
>> be worth the trouble of adding the -b option. Fixing the doc-string
>> may be enough: (...)
>
> Well, my problem is that _syscmd_file() fails with a non encodable filename on Linux because the file program writes the filename to stdout, but os.popen() is unable to parse it because the filename is not decodable from the filesystem encoding (without surrogateescape).
>
> I would like to add -b option just to avoid this encoding problem.
>
> The bug:
> ---
> $ cp /bin/ls $(echo -ne "ls-\x80")
> $ ./python -c "import platform; platform.architecture(executable='ls-\uDC80')"
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1059, in architecture
> output = _syscmd_file(executable, '')
> File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1006, in _syscmd_file
> output = f.read().strip()
> File "/home/SHARE/SVN/py3k/Lib/codecs.py", line 300, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 24: invalid start byte
> ---
>
> See also #8611 and #9425.
Ah, right. Well, then we probably have to use -b after all.
A note on your example: executable is really only meant to be used for
pointing platform to a non-standard location of the Python interpreter
(e.g. in case you used a freeze tool or embedded Python in an application).
However, it is still possible to have that path contain weird
characters, so the example is still valid.
> @lemburg: Your mail client likes to change the issues' title by adding some spaces :-)
The problem is related to the way Mailman splits long subject lines
in the header. The roundup email interface should probably
remove any extra spaces or simply ignore the title altogether.
|
msg113776 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 14:15 |
@r.david.murray, @lemburg: So, can I commit my patch?
> However, it is still possible to have that path contain weird
> characters, so the example is still valid.
It's not only a question of "weird" characters. Use case of #8611: install Python in a directory with a name not encodable to the filesystem encoding and try run Python. Install or use a Python installed in an external media like an USB key (encoding problem is not a rare problem with USB keys).
On Windows, it's quite easy to find a character not encodable to the filesystem encoding (mbcs). Eg. Ł for codepage 1252. On Linux, any non-ASCII can be used if you run Python with C locale (locale encoding: ascii).
#4352 and #8988 (Windows issues, so mbcs encoding) are directly related to #8611.
|
msg113780 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2010-08-13 14:54 |
STINNER Victor wrote:
>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> @r.david.murray, @lemburg: So, can I commit my patch?
Yes, go ahead with 3.2 patch. It should not go into 2.7, though,
since it doesn't solve any problems in 2.7.
If we get complaints about the -b option than we can add a work-around,
but I doubt that we will. It would be good if you could add a test
case for this, since without -b support, the function won't give
an error: it will simply use the defaults.
|
msg113793 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 16:30 |
Commited as r83981.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:05 | admin | set | github: 53769 |
2010-08-13 16:30:35 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg113793
|
2010-08-13 14:54:29 | lemburg | set | messages:
+ msg113780 title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file() |
2010-08-13 14:15:12 | vstinner | set | messages:
+ msg113776 |
2010-08-13 14:04:54 | lemburg | set | messages:
+ msg113775 title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file() |
2010-08-13 13:32:57 | vstinner | set | messages:
+ msg113770 |
2010-08-13 13:32:17 | vstinner | set | messages:
+ msg113769 |
2010-08-13 10:35:51 | lemburg | set | messages:
+ msg113755 title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file() |
2010-08-13 02:04:24 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg113741
|
2010-08-13 00:53:41 | eric.araujo | set | nosy:
+ eric.araujo messages:
+ msg113729
|
2010-08-13 00:53:31 | vstinner | set | files:
- _syscmd_file.patch |
2010-08-13 00:53:28 | vstinner | set | files:
- _syscmd_file-2.patch |
2010-08-13 00:53:23 | vstinner | set | files:
+ _syscmd_file-3.patch
messages:
+ msg113728 |
2010-08-13 00:50:26 | vstinner | set | files:
+ _syscmd_file-2.patch
messages:
+ msg113727 |
2010-08-10 17:56:28 | lemburg | set | messages:
+ msg113553 title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file() |
2010-08-10 17:21:32 | vstinner | create | |