This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [Windows] exec*/spawn* problem with spaces in args
Type: Stage:
Components: Distutils, Windows Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: tarek Nosy List: akuchling, tarek, tim.peters, wom-work
Priority: normal Keywords:

Created on 2001-06-26 03:17 by wom-work, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
print_args.c wom-work, 2001-06-26 03:18 C test program that prints arguments in C notation
Messages (6)
msg5186 - (view) Author: Ben Hutchings (wom-work) Date: 2001-06-26 03:17
DOS and Windows processes are not given an argument
vector, as Unix processes are; instead they are given a
command line and are expected to perform any necessary
argument parsing themselves. Each C run-time library
must convert command lines into argument vectors for
the main() function, and if it includes exec* and
spawn* functions then those must convert argument
vectors into a command-line. Naturally, the various
implementations differ in interesting ways.

The Visual C++ run-time library (MSVCRT) implementation
of the exec* and spawn* functions is particularly awful
in that it simply concatenates the strings with spaces
in-between (see source file cenvarg.c), which means
that arguments with embedded spaces are likely to turn
into multiple arguments in the new process. Obviously,
when Python is built using Visual C++, its os.exec* and
os.spawn* functions behave in this way too. MS prefers
to work around this bug (see Knowledge Base article
Q145937) rather than to fix it. Therefore I think
Python must work around it too when built with Visual C++.

I experimented with MSVCRT and Cygwin (using the
attached program print_args.c) and could not find a way
to convert an argument vector into a command line that
they would both convert back to the same argument
vector, but I got close.

MSVCRT's parser requires spaces that are part of an
argument to be enclosed in double-quotes. The
double-quotes do not have to enclose the whole
argument. Literal double-quotes must be escaped by
preceding them with a backslash. If an argument
contains literal backslashes before a literal or
delimiting double-quote, those backslashes must be
escaped by doubling them. If there is an unmatched
enclosing double-quote then the parser behaves as if
there was another double-quote at the end of the line.

Cygwin's parser requires spaces that are part of an
argument to be enclosed in double-quotes. The
double-quotes do not have to enclose the whole
argument. Literal double-quotes may be escaped by
preceding them with a backslash, but then they count as
enclosing double-quote as well, which appears to be a
bug. They may also be escaped by doubling them, in
which case they must be enclosed in double-quotes;
since MSVCRT does not accept this, it's useless. As far
as I can see, literal backslashes before a literal
double-quote must not be escaped and literal
backslashes before an enclosing double-quote *cannot*
be escaped. It's really quite hard to understand what
its rules are for backslashes and double-quotes, and I
think it's broken. If there is an unmatched enclosing
double-quote then the parser behaves as if there was
another double-quote at the end of the line.

Here's a Python version of a partial fix for use in
nt.exec* and nt.spawn*.  This function modifies
argument strings so that the resulting command line
will satisfy programs that use MSVCRT, and programs
that use Cygwin if that's possible.

def escape(arg):
    import re
    # If arg contains no space or double-quote then
    # no escaping is needed.
    if not re.search(r'[ "]', arg):
        return arg
    # Otherwise the argument must be quoted and all
    # double-quotes, preceding backslashes, and
    # trailing backslashes, must be escaped.
    def repl(match):
        if match.group(2):
            return match.group(1) * 2 + '\\"'
        else:
            return match.group(1) * 2
    return '"' + re.sub(r'(\\*)("|$)', repl, arg) + '"'

This could perhaps be used as a workaround for the
problem. Unfortunately it would conflict with
workarounds implemented at the Python level (which I
have been using for a while).

msg5187 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-07-12 02:32
Logged In: YES 
user_id=31435

Note that processes using WinMain can get at argc and argv 
under MSVC via including stdlib.h and using __argc and 
__argv instead.

I agree the space behavior sucks regardless.  However, as 
you've discovered, there's nothing magical we can do about 
it without breaking the workarounds people have already 
developed on their own -- including distutils.

The right way to address this is to add more smarts to 
spawn.py in distutils, then press to adopt that in the std 
library (distutils already does *some* magical arg quoting 
on win32 systems, and could use your help to do a better 
job of it).

Accordingly, I added [Windows] to the summary line, changed 
the category to distutils, and reassigned to Greg Ward for 
consideration.
msg5188 - (view) Author: Ben Hutchings (wom-work) Date: 2001-07-12 04:30
Logged In: YES 
user_id=203860

"Note that processes using WinMain can get at argc and argv 
under MSVC via including stdlib.h and using __argc and 
__argv instead."

This is irrelevant.  The OS passes the command line into a 
process as a single string, which it makes accessible 
through the GetCommandLine() function.  The argument vector 
received by main() or accessible as __argv is generated 
from this by the C run-time library.

"The right way to address this is to add more smarts to 
spawn.py in distutils"

I disagree.  The right thing to do is to make these 
functions behave in the same way across platforms, as far 
as possible.  Perhaps this could be done in two stages - in 
the first release, make the fix optional, and in the 
second, use it all the time.
msg5189 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-07-12 04:57
Logged In: YES 
user_id=31435

distutils is *trying* to make spawn work the same way 
across platforms, via spawn.py.  Help it!  You're not 
likely to get anywhere with a change to the os.spawn family 
because you already know it will break code -- and it will 
break disutils in particular.  If you want to break code, 
this needs a PEP first:  write up your "two stage" approach 
in PEP and let the community have at it.  If you read 
c.l.py, you should have a feel for how warmly that's likely 
to be received <wink>.

The bit about __argv was just FYI (you seemed unaware of 
it; I agree it's irrelevant to what you want to achieve).
msg5190 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2006-12-21 14:48
Does this argument-line parsing weirdness still have relevance to current MSVC runtimes?

Changing os.spawn() seems like a non-starter because it'll break existing code; the Python landscape has changed and subprocess.py is a higher-level, more useful way to run subprocesses (it has a MS C runtime-alike function, list2cmdline).

Unless someone submits a patch to change _nt_quote_args in distutils/spawn.py, I'll close this bug in a few months (the next time I visit the really old bugs).


 
msg85575 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2009-04-05 22:05
I am closing it. I have also added a test case for this function, for
future changes. r71286
History
Date User Action Args
2022-04-10 16:04:09adminsetgithub: 34672
2009-04-05 22:05:45tareksetstatus: open -> closed
resolution: rejected
messages: + msg85575
2009-02-16 16:26:07akitadasetnosy: + tarek
assignee: tarek
versions: + Python 3.1, Python 2.7, - Python 2.6
2008-01-05 20:07:47christian.heimessetcomponents: + Windows
versions: + Python 2.6
2001-06-26 03:17:08wom-workcreate