classification
Title: platform.py: use -b option for file command in _syscmd_file()
Type: Stage:
Components: Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, lemburg, pitrou, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2010-08-10 17:21 by vstinner, last changed 2010-08-13 16:30 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
_syscmd_file-3.patch vstinner, 2010-08-13 00:53
Messages (13)
msg113549 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-10 17:21
Lib/platform.py was created 7 years ago by r32391. _syscmd_file() docstring was never changed whereas it is inconsistent with the implementation:
---
def _syscmd_file(target,default=''):

    """ Interface to the system's file command.

        The function uses the -b option of the file command to have it
        ommit the filename in its output and if possible the -L option
        to have the command follow symlinks. It returns default in
        case the command should fail.

    """
    ...
    target = _follow_symlinks(target).replace('"', '\\"')
    ...
    f = os.popen('file "%s" 2> %s' % (target, DEV_NULL))
    ...
---

It doesn't use -L option but use Python to follow the link, and use an regex to remove the filename.

Attached patch enables -b option to avoid problem with non-ascii filenames but ascii locale encoding (see #8611 and #9425) and updates the docstring.

--

To fix the non-ascii problem, I tried a different approach by using subprocess API which gives a bytes version of stdout and so avoid the encoding issue. But I commited the patch on python trunk (2.7) which had a bootstrap issue. py3k had no bootstrap issue, but the new patch (use -b option) is simpler.

Commits: r80166 (trunk), r80167 (py3k). Reverted: r80171+r80189 (trunk) and r80190 (py3k).

More details in the following mail thread:
http://mail.python.org/pipermail/python-checkins/2010-April/092092.html
msg113553 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-10 17:56
STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
> 
> Lib/platform.py was created 7 years ago by r32391. _syscmd_file() docstring was never changed whereas it is inconsistent with the implementation:
> ---
> def _syscmd_file(target,default=''):
> 
>     """ Interface to the system's file command.
> 
>         The function uses the -b option of the file command to have it
>         ommit the filename in its output and if possible the -L option
>         to have the command follow symlinks. It returns default in
>         case the command should fail.
> 
>     """
>     ...
>     target = _follow_symlinks(target).replace('"', '\\"')
>     ...
>     f = os.popen('file "%s" 2> %s' % (target, DEV_NULL))
>     ...
> ---
> 
> It doesn't use -L option but use Python to follow the link, and use an regex to remove the filename.
> 
> Attached patch enables -b option to avoid problem with non-ascii filenames but ascii locale encoding (see #8611 and #9425) and updates the docstring.

The patch looks good. Just one nit: could you please indent the doc-string
to match the original indentation ?

BTW: I had a look in my archive for platform.py, but couldn't find
where the symlink logic was changed to use Python's APIs instead.

> --
> 
> To fix the non-ascii problem, I tried a different approach by using subprocess API which gives a bytes version of stdout and so avoid the encoding issue. But I commited the patch on python trunk (2.7) which had a bootstrap issue. py3k had no bootstrap issue, but the new patch (use -b option) is simpler.
> 
> Commits: r80166 (trunk), r80167 (py3k). Reverted: r80171+r80189 (trunk) and r80190 (py3k).
> 
> More details in the following mail thread:
> http://mail.python.org/pipermail/python-checkins/2010-April/092092.html

The main reason why this doesn't work on trunk is that platform.py is
supposed to work for many different Python versions, since it's
original use case was to provide platform information to build systems
targeting multiple Python versions.

On Python3, the same is true, but only for the 3.x versions.
msg113727 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 00:50
> The patch looks good.

Can it/should it be applied to 2.7 too?

> Just one nit: could you please indent the doc-string
> to match the original indentation ?

Done. New patch attached. Is it ok like that?
msg113728 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 00:53
Hey! I don't know why, but I posted a truncated patch. It doesn't remove the code removing the filename and so it breaks the code. New try: version 3 should be ok :-)
msg113729 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-13 00:53
Is it guaranteed that the -b option will be present in every version of file?
msg113741 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-13 02:04
Guaranteed?  No.  -b is not required by posix/SUS.  I bet it exists everywhere we care about, though.  (cf. http://en.wikipedia.org/wiki/File_(command))
msg113755 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-13 10:35
Éric Araujo wrote:
> 
> Éric Araujo <merwok@netwok.org> added the comment:
> 
> Is it guaranteed that the -b option will be present in every version of file?

Looking at the only use case of _syscmd_file(), it may not even be worth
the trouble of adding the -b option. Fixing the doc-string may be enough:

    # Get data from the 'file' system command
    if executable:
        output = _syscmd_file(executable, '')
    else:
        output = ''

    ...

    # Split the output into a list of strings omitting the filename
    fileout = _architecture_split(output)[1:]

Note how architecture() already chops off the filename, so it
relies on the existing behavior, rather than the one described
in the doc-string of _syscmd_file().

Sorry, Victor, for not spotting this earlier on.
msg113769 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 13:32
> Looking at the only use case of _syscmd_file(), it may not even 
> be worth the trouble of adding the -b option. Fixing the doc-string 
> may be enough: (...)

Well, my problem is that _syscmd_file() fails with a non encodable filename on Linux because the file program writes the filename to stdout, but os.popen() is unable to parse it because the filename is not decodable from the filesystem encoding (without surrogateescape).

I would like to add -b option just to avoid this encoding problem.

The bug:
---
$ cp /bin/ls $(echo -ne "ls-\x80")
$ ./python -c "import platform; platform.architecture(executable='ls-\uDC80')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1059, in architecture
    output = _syscmd_file(executable, '')
  File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1006, in _syscmd_file
    output = f.read().strip()
  File "/home/SHARE/SVN/py3k/Lib/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 24: invalid start byte
---

See also #8611 and #9425.
msg113770 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 13:32
@lemburg: Your mail client likes to change the issues' title by adding some spaces :-)
msg113775 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-13 14:04
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Looking at the only use case of _syscmd_file(), it may not even 
>> be worth the trouble of adding the -b option. Fixing the doc-string 
>> may be enough: (...)
> 
> Well, my problem is that _syscmd_file() fails with a non encodable filename on Linux because the file program writes the filename to stdout, but os.popen() is unable to parse it because the filename is not decodable from the filesystem encoding (without surrogateescape).
> 
> I would like to add -b option just to avoid this encoding problem.
> 
> The bug:
> ---
> $ cp /bin/ls $(echo -ne "ls-\x80")
> $ ./python -c "import platform; platform.architecture(executable='ls-\uDC80')"
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1059, in architecture
>     output = _syscmd_file(executable, '')
>   File "/home/SHARE/SVN/py3k/Lib/platform.py", line 1006, in _syscmd_file
>     output = f.read().strip()
>   File "/home/SHARE/SVN/py3k/Lib/codecs.py", line 300, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 24: invalid start byte
> ---
> 
> See also #8611 and #9425.

Ah, right. Well, then we probably have to use -b after all.

A note on your example: executable is really only meant to be used for
pointing platform to a non-standard location of the Python interpreter
(e.g. in case you used a freeze tool or embedded Python in an application).

However, it is still possible to have that path contain weird
characters, so the example is still valid.

> @lemburg: Your mail client likes to change the issues' title by adding some spaces :-)

The problem is related to the way Mailman splits long subject lines
in the header. The roundup email interface should probably
remove any extra spaces or simply ignore the title altogether.
msg113776 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 14:15
@r.david.murray, @lemburg: So, can I commit my patch?

> However, it is still possible to have that path contain weird
> characters, so the example is still valid.

It's not only a question of "weird" characters. Use case of #8611: install Python in a directory with a name not encodable to the filesystem encoding and try run Python. Install or use a Python installed in an external media like an USB key (encoding problem is not a rare problem with USB keys).

On Windows, it's quite easy to find a character not encodable to the filesystem encoding (mbcs). Eg. Ł for codepage 1252. On Linux, any non-ASCII can be used if you run Python with C locale (locale encoding: ascii).

#4352 and #8988 (Windows issues, so mbcs encoding) are directly related to #8611.
msg113780 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-13 14:54
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> @r.david.murray, @lemburg: So, can I commit my patch?

Yes, go ahead with 3.2 patch. It should not go into 2.7, though,
since it doesn't solve any problems in 2.7.

If we get complaints about the -b option than we can add a work-around,
but I doubt that we will. It would be good if you could add a test
case for this, since without -b support, the function won't give
an error: it will simply use the defaults.
msg113793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 16:30
Commited as r83981.
History
Date User Action Args
2010-08-13 16:30:35vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg113793
2010-08-13 14:54:29lemburgsetmessages: + msg113780
title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file()
2010-08-13 14:15:12vstinnersetmessages: + msg113776
2010-08-13 14:04:54lemburgsetmessages: + msg113775
title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file()
2010-08-13 13:32:57vstinnersetmessages: + msg113770
2010-08-13 13:32:17vstinnersetmessages: + msg113769
2010-08-13 10:35:51lemburgsetmessages: + msg113755
title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file()
2010-08-13 02:04:24r.david.murraysetnosy: + r.david.murray
messages: + msg113741
2010-08-13 00:53:41eric.araujosetnosy: + eric.araujo
messages: + msg113729
2010-08-13 00:53:31vstinnersetfiles: - _syscmd_file.patch
2010-08-13 00:53:28vstinnersetfiles: - _syscmd_file-2.patch
2010-08-13 00:53:23vstinnersetfiles: + _syscmd_file-3.patch

messages: + msg113728
2010-08-13 00:50:26vstinnersetfiles: + _syscmd_file-2.patch

messages: + msg113727
2010-08-10 17:56:28lemburgsetmessages: + msg113553
title: platform.py: use -b option for file command in _syscmd_file() -> platform.py: use -b option for file command in _syscmd_file()
2010-08-10 17:21:32vstinnercreate