Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popen should raise ValueError if pass a string when shell=False or a list when shell=True #52087

Open
bitdancer opened this issue Feb 2, 2010 · 31 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@bitdancer
Copy link
Member

BPO 7839
Nosy @csernazs, @ncoghlan, @vstinner, @larryhastings, @ericvsmith, @giampaolo, @merwok, @bitdancer, @briancurtin, @asvetlov
Files
  • test_subprocess.py: Test case for this bug, works in Python 2 and Python 3.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-02-02.19:08:03.339>
    labels = ['type-feature', 'library']
    title = 'Popen should raise ValueError if pass a string when shell=False or a list when shell=True'
    updated_at = <Date 2013-11-06.19:37:27.696>
    user = 'https://github.com/bitdancer'

    bugs.python.org fields:

    activity = <Date 2013-11-06.19:37:27.696>
    actor = 'akuchling'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2010-02-02.19:08:03.339>
    creator = 'r.david.murray'
    dependencies = []
    files = ['17587']
    hgrepos = []
    issue_num = 7839
    keywords = []
    message_count = 31.0
    messages = ['98754', '100202', '100205', '100206', '100207', '100209', '107310', '107311', '107313', '107316', '107322', '107353', '107365', '107369', '107770', '111337', '157217', '157227', '157228', '157230', '157252', '157259', '157269', '157294', '157322', '157323', '157332', '157608', '157613', '157800', '160587']
    nosy_count = 15.0
    nosy_names = ['holdenweb', 'csernazs', 'ncoghlan', 'vstinner', 'dstanek', 'larry', 'eric.smith', 'giampaolo.rodola', 'eric.araujo', 'Arfrever', 'r.david.murray', 'brian.curtin', 'cvrebert', 'asvetlov', 'Alexander.Belopolsky']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue7839'
    versions = ['Python 3.3']

    @bitdancer
    Copy link
    Member Author

    Currently Popen accepts either a string or a list regardless of the value of 'shell'. In the shell=False case, a string is interpreted as the command name, no matter what it actually is. In the shell=True case, a list is interpreted as args[0] being the string to pass to sh -c, and the remaining elements are passed as additional arguments to sh.

    Neither of these behaviors is particularly useful. It is easy enough to put the command name in a single element list with shell=False, and as far as I've been able to tell there is nothing useful you can do with passing arguments to sh after the '-c [command'] argument.

    Further, the current behavior leads to common mistakes and misunderstandings, especially the acceptance of a string when shell=False. The inexperienced user will pass the complete command string, and get the confusing error message "No such file or directory".

    The behavior when passing a list when shell=True is even more mysterious. In this case, all elements of the list after args[0] appear to the programmer as if they are ignored. This problem is made worse by the fact that the documentation for this case makes it sound as if multiple strings *can* be passed; this confusion at least will be cleared up by the patch from bpo-6760.

    I would like to see Popen changed so that when shell=False, passing a string for args results in a ValueError, and likewise when shell=True, passing a list results in a ValueError.

    Since this is a backward incompatible change, we'd need to first deprecate the current behavior in 3.2, and then introduce the ValueError in 3.3. While this would be annoying to those people who needed to change their programs, I think it would be worth it for the improved error feedback the revised API would provide.

    @bitdancer bitdancer added easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 2, 2010
    @ChristopheSimonis
    Copy link
    Mannequin

    ChristopheSimonis mannequin commented Feb 28, 2010

    In this case, why not just depreciate the shell argument ?

    @bitdancer
    Copy link
    Member Author

    Because the shell argument provides important functionality. Or are you suggesting that passing a list implies shell=False and passing a string implies shell=True? That is a possibility, but I think it would not be a good idea, because people will 'accidentally' get shell=True (which, you will note is *not* the default), which is less secure. It is much better, IMO, to make people ask explicitly for the less-safe behavior.

    @ericvsmith
    Copy link
    Member

    I think the better design is to have 2 distinct APIs: Popen_shell and Popen_exec. I'm not wild about the name Popen_exec, suggestions welcome. Neither of these would accept a shell parameter.

    For starters these could be convenience APIs that just call Popen with the correct shell= value, and check the type of args. We could ultimately deprecate Popen itself.

    I don't see a use case where you programmatically compute the value of shell: it's always known as a function of how you're building up args. And even if such a rare case exists, you could select between the two APIs to call.

    Whether at this point we can make this change is another issue, of course. I'm just trying to get at the best design. But if it's done over a number of releases I think it's doable.

    @briancurtin
    Copy link
    Member

    That seems reasonable. We already have subprocess.call, the thin wrapper around Popen. Maybe add this as subprocess.call_shell and call_exec?

    @bitdancer
    Copy link
    Member Author

    Hmm. I liked Eric's idea, and it would be easier to get in, but 'call' is actually an argument against it. It would mean that in addition to PopenExec and PopenShell we'd need call_exec and call_shell, and check_call_exec and check_call_shell. Proliferating interfaces to handle a boolean parameter doesn't seem minimalist.

    @larryhastings
    Copy link
    Contributor

    I noticed this a while ago. And FWIW it's not just 3.x; I see this same behavior in 2.6.

    I've whipped up a test case, attached, which runs in both Python 2 and Python 3. The test runs "sys.interpreter -V" four times: it tries all the combinations of shell=True/False and the cmdline argument as either a list or a string.

    When shell=True and cmdline is an array, Python runs as an interactive shell. (You have to press Ctrl-D to close it; the test warns you this is happening.)
    When shell=False and cmdline is a string, Python raises OSError, "[Errno 2] No such file or directory".
    In the other two cases the code runs correctly, producing the version number of Python.

    The test would probably be better if I used stdin=PIPE too, so you didn't have to press Ctrl-D. But I couldn't figure out how to make it produce the interactive session stuff when I did that. Anyway, it wouldn't be hard to make this into a proper regression test.

    My two cents: I assume from the discussion that this is not fixable; that is, it's not possible to have a shell and a string, or no shell and a list. Failing that, the right decision is what R. David discussed on 2010-02-28: ignore the shell argument passed in and infer it ourselves from whether cmdline is a list or a string. I don't buy the security argument--since the invalid combinations of shell and cmdline mean the function doesn't work, existing callers have already made the decision whether they can live with shell=True. We should deprecate the shell argument and update the documentation to make the behavior crystal clear.

    @larryhastings
    Copy link
    Contributor

    I realize we're down to the wire, but would it be too late to fix this in 2.7? It is a genuine bug, and it won't break any correct code.

    @larryhastings
    Copy link
    Contributor

    Sorry for spamming updates, but here's two more things to consider.

    1. What does it do on Windows? For all I know all four combinations work fine there and we should preserve the existing functionality.

    2. All four combinations work fine if you call a program without arguments. I tried it with both "/bin/ls" and sys.executable and it worked like a champ. The problem *only* happens if you specify command-line arguments.

    @bitdancer
    Copy link
    Member Author

    Unless we go the proliferating-interfaces route, it does represent a behavior change, and so if accepted would need to go through a deprecation cycle. And if we did go that route, it would be a new feature. So nothing can happen in 2.7, since it is already in the RC stage.

    By the way, I did not suggest dropping the shell parameter, I argued against doing that. The 'security' I referred to is that when you use a shell you are subject to shell-metacharacter-based attacks (and bugs!) if any elements of the command line come from user input and you don't sanitize them. This problem doesn't exist with shell=False, which is why it is the default.

    @csernazs
    Copy link
    Mannequin

    csernazs mannequin commented Jun 8, 2010

    I would say that both string and list should be accepted in args, and depending on the shell parameter, the module should create a list or a string from the specified list/string.

    We already have a list2cmdline function to convert a list to string, we would only need to create the inverse of this function to convert a string to a list.

    Executing a program whose parameters are coming from external source (eg. user input) can result security problems if those are not specified as a list and in this case I would be really happy to specify my parameters as a list to Popen and it would do the appropriate conversions as above.

    @larryhastings
    Copy link
    Contributor

    Zsolt: an excellent idea! That shouldn't change the semantics of the function, and therefore is strictly a bug fix. Which means we could potentially backport it, yes?

    Obviously we only need to change it between string / list if we're going to have this problem, and that only happens when there are parameters. So the change could be very narrow in scope. Lovely!

    R. David: Why did you remove Python 2.7 from the list? We see the behavior there too.

    @bitdancer
    Copy link
    Member Author

    Because we use the versions field to indicate which versions a patch will be applied to, if it is, and I created this bug as a feature request, and as such it is not a candidate for 2.7.

    Changing list to string for shell=True is a behavior change (currently excess list elements are passed to the shell...and I don't *know* that that is useless, someone might be exploiting it) and hard to get right. Changing string to list might be possible on unix using shlex, but is a distinctly non-trivial change, would need to be considered carefully with regard to its implications, and I have no idea what it would mean on Windows.

    Since 2.7 is in RC status, there is no way such a fix will be accepted for 2.7. You can try to make a case for this proposal as a bug fix for 2.7.1, but I'm not sanguine about its chances. Personally I'm -0.5 on such a change. I think it is better to leave the control of the command formatting in the hands of the programmer. Doing such a conversion is too close to guessing for my taste.

    If you want a wider selection of opinions you can post the issue to python-dev and request feedback.

    @ericvsmith
    Copy link
    Member

    I agree with David. For the issue raised here, at most I would make (list, shell=True) and (str, shell=False) raise errors.

    There's an issue (that I can't find right now) for creating functions that convert from str->list and list->str for cases such as this. shlex is sort of str->list, but it has issues on Windows. I've mentioned elsewhere that for Windows list->str is not always possible, because there is no standard for the corresponding str->list that each program is responsible for.

    On Unix-like systems, when the called programs are run they get a pre-parsed list (argv). This list is created by the caller, either directly or through a shell. When a shell is doing this, at least the behavior is somewhat standard.

    On Windows, the called programs get a string, and they're in charge of parsing it, if they want to. Many programs use the C library to create argv, but exactly how that parsing works changes between runtime vendors and over time within a vendor. There's no solution that will work in all cases.

    Which is not to suggest that we shouldn't try, just that it will be hard.

    @bitdancer
    Copy link
    Member Author

    See bpo-8972 for motivation for accepting this proposal. Maybe I should start working on the patch :)

    @holdenweb
    Copy link
    Member

    The test program Larry provided does not appear to function as intended on Windows, and runs without either producing error messages or requiring interactive input. Here's a typical output, in this case from Python 3.1 on Vista:

    C:\Users\sholden\Documents\bpo-7839>\python31\python test_subprocess.py
    Testing with shell= True and array= True
    ** This is the one that runs an interactive shell.
    ** You should press Ctrl-D.
    Output:
    [] Python 3.1.1
    Testing with shell= True and array= False
    Output:
    [] Python 3.1.1
    Testing with shell= False and array= True
    Output:
    [] Python 3.1.1
    Testing with shell= False and array= False
    Output:
    [] Python 3.1.1

    The same behavior was observed on 2.6 and 2.5.2.

    [I also removed "Christophe Simonis" from the nosy list as the tracker was complaining that there was no such user].

    @asvetlov
    Copy link
    Contributor

    I like Eric's proposition, e.g. raising error if (list, shell=True) or (str, shell=False)

    If nobody object I can try to make initial patch for that.

    @ericvsmith
    Copy link
    Member

    While I still think raising those errors is a good thing, we should also look at Nick's shell-command:
    http://shell-command.readthedocs.org/en/latest/index.html

    @asvetlov
    Copy link
    Contributor

    Nick's library is awesome and I +1 to include it into stdlib if Nick is ready to do.

    But also I like to prevent obviously bad usage of popen.
    We cannot and don't want to remove popen shell=True param, so let's add raising exception for useless parameters combination.

    @bitdancer
    Copy link
    Member Author

    Yes, Nick's library looks good, but that should be a separate issue, it isn't really relevant to this one.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 1, 2012

    For the record, my shell access convenience API proposal is in bpo-13238.

    I'm not going to push that for 3.3 though - convenient shell access from Python is currently an area with a lot of active third party development, so I think it's worthwhile continuing to be conservative on this topic rather than rushing ahead with blessing any particular approach for stdlib inclusion.

    To get back to RDM's specific proposal (that shell=True/False should constrain the acceptable types for the first positional argument to str/list respectively), I'm -1 on the idea.

    The reason I'm not a fan is the fact that, with "shell=True", you can use the *executable* argument to Popen to select a non-default shell. At that point, passing a list can make sense, even if it isn't useful for the default shell. On Windows, the implicit invocation of list2cmdline can already make this a useful thing to do.

    For the other way around, passing a string with "shell=False" can be a straightforward way to launch GUI applications from a script. Specifying executable directly can also have an effect on this case, too (although I forget the consequences)

    Now, what may make sense is to provide separate Popen.exec and Popen.shell *class methods* that correspond to shell=False and shell=True with the stricter type checking on the first argument (i.e. Popen.exec only accepts a list, POpen.shell only accepts a string).

    Another advantage of adding the two new class methods is that it would give us the opportunity to *change some of the defaults* (e.g making close_fds=True the default behaviour).

    @cvrebert
    Copy link
    Mannequin

    cvrebert mannequin commented Apr 1, 2012

    The reason I'm not a fan is the fact that, with "shell=True", you can use the *executable* argument to Popen to select a non-default shell. At that point, passing a list can make sense, even if it isn't useful for the default shell.

    Modulo Windows, at that point, why not just run the shell explicitly (i.e. shell=False, args=['my_sh', ...])? Also, a '-c' argument will be forcibly prepended to the passed-in args in your case, which may frustrate such use-cases.

    For the other way around, passing a string with "shell=False" can be a straightforward way to launch GUI applications from a script.

    And they may need to eventually pass it (an) argument(s), and when that happens, cue confusion! There's a reason the subprocess docs feature the not-strictly-necessary clause "[a str args w/ shell=False] will only work if the program is being given no arguments". The distinction regularly trips/tripped users up.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Apr 1, 2012

    After trying to make a patch I found what current test suite itself has calls like (str, shell=False), (bytes, shell=True) and (['shell command'], shell=True).

    We can:

    1. Implement Eric's suggestion with fixing/removing broken tests.
    2. Add Pope.shell and Popen.exec as Nick propose.

    I don't like former because it changes already existing contracts fixed and published by explicit unittests without strong reason. Also Nick pointed reasons why user may need existing functionality. Adding new 'safe' shell and exec classmethods as prime entrypoints for Popen make sense.

    @bitdancer
    Copy link
    Member Author

    Which just goes to show that using Popen correctly is not obvious, I suppose.

    Given that adding these errors *would* break backward compatibility, there would have to be a deprecation if it was done.

    Personally I don't see the point in adding new class methods (complicating the interface) unless we are going to deprecate the current methods...in which case why not just deprecate the incorrect way of calling it? (Note: I say incorrect because the 'call a different shell' variant *doesn't work* unless you write a custom shell that ignores '-c', and as Chris points out there's no functionality *gained* by the windows variation, and there is confusion introduced as to exactly what is going on).

    So, if we aren't going to deprecate the "incorrect" calls, I vote for just closing this issue as "won't fix".

    (I suppose that a third alternative exists: make the "incorrect" calls actually do something useful while preserving backward compatibility with the existing behavior. That may be difficult, especially doing it in a way that logically consistent cross-platform.)

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Apr 1, 2012

    BTW we need to drop win9x and win2000 support, see bpo-14470

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Apr 1, 2012

    I'm +1 for going though deprecation process for Popen args to make parameters combination clean and obvious.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 1, 2012

    Everyone missed my other argument in favour of alternate constructor
    methods: fixing the currently wrong default arguments.

    There is no good reason to break working code when beginner confusion can
    be better addressed by telling them to avoid calling the Popen constructor
    directly in favour of newer APIs that have benefitted from years of
    experience with the current API.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Apr 5, 2012

    What's about test which pass bytes to Popen?
    Should it be deprecated and should Popen accept only unicode strings only — I mean str type?
    As I know the trend of py3k to get rid of bytes in filesystem names.

    @bitdancer
    Copy link
    Member Author

    No, you need to be able to pass bytes to Popen, just like you do to the os.exec[xx] functions. When the OS doesn't fully support unicode, that is sometimes the only option. As for filenames; again, as long as the underlying systems use bytes filenames we need to support it. Currently we encode them when received using surrogateescape and decode them back to bytes when used.

    I am not sure what os.exec[xx] does with strings containing non-ascii. Presumably it uses some default encoding or other, which seems to be utf-8 on my system. (It doesn't seem to be explicitly documented where those functions are discussed in the os module.)

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Apr 8, 2012

    If fylesystem doesn't support unicode (only Windows directly does as I know) str filename should to be converted by sys.getfilesystemencoding(). os.exec family already does it as well as other fs functions — but they are supports bytes also.
    Mayby deprecation bytes for high-level Popen is good idea.

    @cvrebert
    Copy link
    Mannequin

    cvrebert mannequin commented May 14, 2012

    Re: Nick's comments:

    Besides close_fds, what other Popen parameters currently have suboptimal defaults?

    @akuchling akuchling removed the easy label Nov 6, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants