classification
Title: Distutils doesn't quote Windows command lines properly
Type: behavior Stage: resolved
Components: Distutils, Windows Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Eric.Wieser, barry, christopher.hogan, dstufft, eric.araujo, jaraco, mgiuca, ncoghlan, r.david.murray, steve.dower, tarek, zach.ware
Priority: normal Keywords: needs review, patch

Created on 2010-06-13 05:22 by mgiuca, last changed 2021-02-03 18:07 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
spawn.patch mgiuca, 2010-06-13 05:22 review
quote-args.patch christopher.hogan, 2015-08-18 18:42 _nt_quote_args fix from Intel review
quote-args-ext.tar.gz christopher.hogan, 2015-08-18 18:42 C extension example for quote-args.patch
libpath-fix.patch christopher.hogan, 2015-08-21 17:54 fix formation of LIBPATH linker args review
Messages (21)
msg107722 - (view) Author: Matt Giuca (mgiuca) Date: 2010-06-13 05:22
I discovered this investigating a bug report that python-cjson doesn't compile properly on Windows (http://pypi.python.org/pypi/python-cjson). Cjson's setup.py asks distutils to compile with the flag '-DMODULE_VERSION="1.0.5"', but distutils.spawn._nt_quote_args is not escaping the quotes correctly.

Specifically, the current behaviour is:
>>> distutils.spawn._nt_quote_args(['-DMODULE_VERSION="1.0.5"'])
['-DMODULE_VERSION="1.0.5"']

I expect the following:
>>> distutils.spawn._nt_quote_args(['-DMODULE_VERSION="1.0.5"'])
['"-DMODULE_VERSION=""1.0.5"""']

Not surprising, since that function contains a big comment:
    # XXX this doesn't seem very robust to me -- but if the Windows guys
    # say it'll work, I guess I'll have to accept it.  (What if an arg
    # contains quotes?  What other magic characters, other than spaces,
    # have to be escaped?  Is there an escaping mechanism other than
    # quoting?)

It only escapes spaces, and that's it. I've proposed a patch which escapes the following characters properly: "&()<>^| (as far as I can tell, these are the "reserved" characters on Windows).

Note: I did not escape * or ?, the wildcard characters. As far as I can tell, these only have special meaning on the command-line itself, and not when supplied to a program.

Alternatively, it could call subprocess.list2cmdline (but there seem to be issues with that: http://bugs.python.org/issue8972).
msg107772 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-14 05:09
Note that list2cmdline does correct quoting (which includes the "s) if you are passing the string directly to a program.  In that case cmd.exe's metacharacters aren't special.  (As I noted in issue 8972, I believe that list2cmdline's current quoting of '|' characters is in error).

Perhaps a canonical list2cmdline actually belongs in shutil?
msg214164 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-20 01:16
I’m more inclined to use the proposed patch rather than list2cmdline, given the issue linked.

I would not like a fix for this to break people’s setup.py scripts; it would be good to know if people work around this bug by monkey-patching the _nt_quote_args function, escaping quotes themselves, or something else.

If the proposed patch is the right solution, unit tests for _nt_quote_args would be needed, plus a high-level test that builds an extension module with a flag like '-DMODULE_VERSION="1.0.5"', so that the windows buildbots can confirm this works.
msg248785 - (view) Author: Chris Hogan (christopher.hogan) * Date: 2015-08-18 18:42
At Intel, we've run into problems with external modules giving paths to _nt_quote_args that contain trailing backslashes, which escapes the final quote and breaks the command.  This fix takes care of special characters, trailing backslashes, and embedded quotes as in Matt's example.  I've also added unit tests for these cases, as well as a high level test that builds an extension and defines some macros that contain quotes and special characters. 

setup.py - Includes a library directory with a trailing backslash. I compiled a simple library, test_module.lib, and put it in that directory to test the linking.  This fails without my fix.  The macro definitions also fail without this fix.

testmodule.c - A simple C extension.  It just makes sure everything worked and returns the string "Success!"

As for current workarounds in setup scripts, I know numpy uses their own distutils.  They wrote a quote_args() function that they use instead of _nt_quote_args(). It checks for spaces, but only if the argument isn't already quoted.  It doesn't account for other special characters or trailing backslashes.
msg248855 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-19 20:15
So I think the only objection to committing this as a bug fix would be the unfortunately real possibility that doing so will break someone's workaround.  My *guess* is that such a workaround would most likely take the form of replacing _nt_quote_args as the one example we have in had did.  So I'm in favor of fixing this.

I've added some people to nosy who might have an opinion and even perhaps some way to look for other examples of people coping with this (in open source code bases that also target windows).

I haven't reviewed the patch yet but I will.
msg248856 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-19 21:45
There are two features of this I have questions about.  If I'm understanding correctly, if passed a quoted string you are not re-quoting it, but you are always stripping a trailing slash even if it is inside quotes.

For the first, that seems wrong.  Either we should be quoting the quotes, or we should assume that if passed a string that starts and ends with quotes, that it is already fully quoted, and not change it.

For the second, doesn't that potentially change the semantics of the string passed in?  You can't actually know that the backslash is part of a path, nor can you be sure that a trailing backslash is not significant even if it is a path.  And in fact if there is a backslash before a double quote, arguably that is supposed to be a double quote that doesn't end the string (see point one above).

I tried list2cmdline, and what it does is to backslash escape appropriately, including preserving blackslashes that are in front of double quotes, and does not add surrounding quotes if there are no internal blanks.  I think list2cmdline is correct here (and it has certainly seen a lot of use), and I think we should either replicate its logic or just use it.

Now, subprocess calls CreateProcess, while distutils calls os.spawnv, but my understanding is that the quoting rules are the same (that is, that the CRT's _spawnv ultimately calls CreateProcess with the arguments joined by blanks).  

Since issue 8972 has been resolved by fixing the broken behavior, I think we should just use list2cmdline.  We could leave _nt_quote_args alone and replace the call to it in _spawn_nt with:

  cmd = [list2cmdline([arg]) for arg in cmd]

or we could put the fix in _nt_quote_args.  The backward compatibility calculus is...not obvious :(
msg248858 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-08-19 22:23
As far as I'm concerned distutils on Windows is a farce so do what you like with it.  It usually can't pick up the version of VS that you know is correct and is installed, the error messages are less than useless, so the only damage that I see is something that is even more broken than it all ready is.  Alternatively it gets slightly further before bombing with yet another useless message.
msg248863 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-19 23:19
Switching distutils.spawn to simply use subprocess.Popen (probably keeping the explicit path search) looks good to me.

Quoting rules are different when you call with shell=True (aka "cmd.exe /c"), since then you also need to escape ^, | and > (with a ^), but if it's being passed directly to the process (via CreateProcess) you don't need to do this. Popen doesn't appear to handle this correctly either.

Because of the backwards compatibility concerns, realistically I think we can only fix this in 3.6 at this stage. The issue has existed for too long to convince Larry to take it for 3.5.0, and changing the behaviour of either distutils or subprocess between .0 and .1 feels risky to me.

(Adding Brett in case he's motivated to take the opportunity to definitively learn and implement Windows's quoting rules and becoming the first person in the world to achieve it. Sounds like his kind of thing :) )
msg248884 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-20 14:02
*I'm* not suggesting using Popen.  That would be a more significant change to the code, and I don't see a motivation for it.  I'm just suggesting using list2cmdline for the quoting.  (If we were rewriting distutils it would be a different story, but we're not.)

And no, Popen with shell=True does *not* handle quoting.  If you use shell=True, *you* are responsible for doing correct quoting (and is what makes using shell=True so dangerous).  That applies to both unix and windows, and the confusion about that fact was what led to the need for the "fix" backout in issue 8972.

If a fix can't go in as a bug fix (whether or not it gets into 3.5.0), then we should perhaps instead consider the rewrite for 3.6: provide a *new* distutils function that does use Popen and does things "right" (based on everything we've learned since distutils was written), leaving the old function in place, deprecated, for backward compatibility.

That doesn't solve the problem for people running in to this bug trying to build extensions for 2.7, 3.4, and 3.5, though.
msg248892 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-20 15:50
Maybe a better starting point is to monkey-patch via setuptools? I know it's nasty, but it'll give us a chance to actually iterate on this before committing to a distutils change. (IMO the biggest problem with distutils is that it gets basically no testing before a final release, other than people who have already worked around its issues.) CCompiler.spawn is fairly easy to patch too.

Adding Jason in case he's still so upset from last time I did this (finding VC compiler for 2.7) he wants to shut the suggestion down immediately :)

As an aside, escaping quotes by doubling them (as in the OP) works fine for MSVC, but is not necessarily standard across all applications. Escaping with a backslash is probably more common, but (e.g.) batch files don't need them escaped if they aren't followed by a space and can't escape them at all if they are followed by a space (then again, we all agree that cmd.exe is fundamentally broken here, so using https://msdn.microsoft.com/en-us/library/windows/desktop/bb776391.aspx (CommandLineToArgvW API) as the benchmark is reasonable).
msg248896 - (view) Author: Chris Hogan (christopher.hogan) * Date: 2015-08-20 16:30
> Since issue 8972 has been resolved by fixing the broken behavior, I think we should just use list2cmdline.  
> We could leave _nt_quote_args alone and replace the call to it in _spawn_nt with:

>  cmd = [list2cmdline([arg]) for arg in cmd]

I verified that this passes the high-level test I wrote.
msg248897 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-20 16:42
The problem is I can easily find plenty of cases where it won't work.

My biggest concern is that list2cmdline will require already-quoted arguments, which is going to break anyone who's already worked around distutils failing to do so (which makes it really difficult to justify fixing 2.7 and 3.4, and soon 3.5 unless we get it into 3.5.0). The escaping it applies for '\"' also does not take into account someone who has correctly escaped their own argument.

Particularly for distutils, we look up tools by simple name and use PATHEXT, so we may find someone's "link.bat", which then needs very different quoting from the actual "link.exe" (this one is probably unsolvable, unfortunately).

The solution isn't as simple as calling an existing function. It's probably actually as complicated as requiring callers to specify whether strings are quoted or not, and then switching between quoting modes based on what type of program (ie. EXE vs BAT) is being launched. list2cmdline is false hope here, IMO, as it just makes it harder for people to work around.
msg248898 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-20 16:44
FWIW, the problem in the original post is that '-DMODULE_VERSION="1.0.5"' is not quoted by distutils and the quotes are eaten.

This can be fixed by specifying '-DMODULE_VERSION=\"1.0.5\"'. There are no spaces in the argument, so the problem is that cl.exe eats unescaped quotes, not that we aren't automatically changing users arguments.
msg248904 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-20 17:42
Well, it sounds like there may be no way around the fact that people may already be doing the extra needed quoting escaping internal quotes) and therefore that this change would break those use cases.  (I note that any such code must be conditional on windows vs unix, since escaping the quotes will break the same command executed on unix).  

Unless...what if we conditionalize it on whether or not '\"' and/or '\\' appears in the argument?

And yes, the fundamental problem is that Windows provides no way to call a program with a list of not-to-be-modified strings.  The problem with a .bat parsing its argument string differently from how everything else parses its argument string is not a new problem, and is faced by anyone using subprocess.  distutils is trying to be (relatively) platform agnostic, to my understanding, so again we'd have to actually parse the input, this time looking for '.bat' and quoting accordingly.

If we solve the .bat problem (as opposed to ignoring it like we currently do in subprocess) we should fix it in subprocess too.

This is certainly getting non-trivial :(
msg248967 - (view) Author: Chris Hogan (christopher.hogan) * Date: 2015-08-21 17:54
Here's a change that might fix the trailing backslash problem for now without breaking anything.  libpath-fix.patch only affects arguments that we know are paths.  This happens before anything is quoted.

This avoids the problem when something like 'C:\path with space\trailing backslash\' is passed to _nt_quote_args() and becomes '"C:\path with space\trailing backslash\"'.  The " is escaped which mangles the string.

I'm not sure if it's the responsibility of the user to not pass in such arguments, or of cpython to sanitize these things.  Is this change harmless, or can you think of situations where it will break something?
msg248984 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-21 22:30
That would break paths like "\" or "D:\" ("D:" != "D:\" - the backslash is meaningful here), but I'm fairly sure _nt_quote_args could add '\\"' to the end if it ends in a backslash already (to escape the trailing slash):

     for i, arg in enumerate(args):
         if ' ' in arg:
-            args[i] = '"%s"' % arg
+            args[i] = '"%s%s"' % (arg, '\\' if arg.endswith('\\') else '')
     return args

For distutils, that's probably an okay fix since we're already quoting everything, and it doesn't try to make things too smart. But it would break people who are already escaping their own trailing backslash, and doesn't provide a workaround that will work both before and after this change. Especially for something like distutils, where people often have one setup.py for all Python versions, this is pretty important.
msg249005 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2015-08-23 14:03
I'm not upset by the idea of monkey patching in Setuptools for vetting certain techniques. I've even considered having Setuptools adopt distutils entirely. Today I filed https://bitbucket.org/pypa/setuptools/issues/417/adopt-distutils to seed the discussion on that possibility. For the sake of this discussion, I would happily accept a pull request to supply the forward-compatible version of _nt_quote_args (or similar), allowing a window for other libraries to make the switch against versions of Python supported by Setuptools.
msg249007 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-08-23 15:02
I notice you say "adopt" rather than "vendor" - effectively removing distutils from the stdlib?

It could work, but to really be able to move distutils forward we need some sort of side-by-side versioning, such that a package can declare which version of distutils is required, preferably without needing to download old versions on the fly when building. I'm sure it's possible but haven't got a complete vision yet.

Maybe it's as simple as having flags that setup scripts enable before they build?

import distutils
distutils.enable(distutils.flags.smart_quote_args)

That way we can actually change things without breaking old build scripts. Should be just as feasible if setuptools adopts distutils, but there'll probably be opposition from elsewhere along the lines of including "core" functionality in the stdlib.
msg330070 - (view) Author: Eric Wieser (Eric.Wieser) * Date: 2018-11-19 02:08
> then we should perhaps instead consider the rewrite for 3.6: provide a *new* distutils function that does use Popen and does things "right" (based on everything we've learned since distutils was written), leaving the old function in place, deprecated, for backward compatibility.

Was any progress made towards achieving this? It's frustrating that the correct quoting behavior we get with `subprocess.Popen(List[str])` is not used by `spawn`, and now every level of distutils code has to think about whether it needs to quote its arguments for the shell.
msg330089 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-11-19 11:36
There's been progress in terms of setuptools eventually adopting distutils, so that looks likely and we're basically planning on it. Adding this as an enhancement to setuptools is going to be a better approach than adding it to distutils in my opinion.
msg386262 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-02-03 18:07
Distutils is now deprecated (see PEP 632) and all tagged issues are being closed. From now until removal, only release blocking issues will be considered for distutils.

If this issue does not relate to distutils, please remove the component and reopen it. If you believe it still requires a fix, most likely the issue should be re-reported at https://github.com/pypa/setuptools
History
Date User Action Args
2021-02-03 18:07:55steve.dowersetstatus: open -> closed
resolution: out of date
messages: + msg386262

stage: test needed -> resolved
2020-01-24 23:28:31brett.cannonsetnosy: - brett.cannon
2018-11-19 11:36:57steve.dowersetmessages: + msg330089
versions: + Python 3.7, Python 3.8, - Python 3.4, Python 3.5
2018-11-19 02:12:25BreamoreBoysetnosy: - BreamoreBoy
2018-11-19 02:08:45Eric.Wiesersetnosy: + Eric.Wieser
messages: + msg330070
2015-08-23 15:02:57steve.dowersetmessages: + msg249007
2015-08-23 14:03:58jaracosetmessages: + msg249005
2015-08-21 22:30:52steve.dowersetmessages: + msg248984
2015-08-21 17:54:26christopher.hogansetfiles: + libpath-fix.patch

messages: + msg248967
2015-08-20 17:42:17r.david.murraysetmessages: + msg248904
2015-08-20 16:45:00steve.dowersetmessages: + msg248898
2015-08-20 16:42:41steve.dowersetmessages: + msg248897
2015-08-20 16:30:52christopher.hogansetmessages: + msg248896
2015-08-20 15:50:01steve.dowersetnosy: + jaraco
messages: + msg248892
2015-08-20 14:02:09r.david.murraysetmessages: + msg248884
2015-08-19 23:19:20steve.dowersetnosy: + brett.cannon
messages: + msg248863
2015-08-19 22:23:27BreamoreBoysetnosy: + BreamoreBoy
messages: + msg248858
2015-08-19 21:45:34r.david.murraysetmessages: + msg248856
2015-08-19 20:15:46r.david.murraysetnosy: + barry, ncoghlan, steve.dower, dstufft
messages: + msg248855
2015-08-18 19:16:50zach.waresetnosy: + zach.ware
2015-08-18 18:42:41christopher.hogansetfiles: + quote-args-ext.tar.gz
2015-08-18 18:42:06christopher.hogansetfiles: + quote-args.patch
nosy: + christopher.hogan
messages: + msg248785

2014-03-20 01:16:43eric.araujosetmessages: + msg214164

assignee: tarek ->
components: + Windows
keywords: + needs review
stage: test needed
2014-03-20 01:11:43eric.araujosetmessages: - msg107751
2014-03-20 01:01:19BreamoreBoysetversions: + Python 2.7, Python 3.4, Python 3.5, - Python 2.6
2010-06-14 05:09:14r.david.murraysetnosy: + r.david.murray
messages: + msg107772
2010-06-13 20:58:39eric.araujosetnosy: + eric.araujo
messages: + msg107751
2010-06-13 05:23:47mgiucasettype: behavior
2010-06-13 05:22:51mgiucacreate