classification
Title: provide a shlex.split alternative for Windows shell syntax
Type: enhancement Stage: test needed
Components: Documentation, Windows Versions: Python 3.3
process
Status: languishing Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, eric.smith, georg.brandl, gjb1002, ianbicking, janssen, pjenvey, pmoore, r.david.murray, titus
Priority: normal Keywords:

Created on 2007-05-24 12:56 by gjb1002, last changed 2014-02-03 19:54 by BreamoreBoy.

Messages (18)
msg55118 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 12:56
What is shlex.split supposed to do on Windows? It seems to be present but it can't handle basic Windows pathnames : shlex.split("C:\\directory\\file") returns C:directoryfile (whereas os.system happily accepts the same string).

Also, it runs in POSIX mode and there is no way to override it! Why isn't POSIX mode the default on POSIX systems and not on non-POSIX systems? Or for it to at least be possible to says shlex.split(s, posix=False)?
msg55119 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-24 13:09
The docs to shlex say: "The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell."

It is not meant to be a cross-platform shell quoting handler, but an implementation of Unix shell behavior.
msg55120 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 14:19
Then what is "non-POSIX mode" if it's only supposed to work on UNIX?

I noted the initial comment but it seemed to be out of date. Especially as it
seems to work fine to copy shlex.split and correct the default value for "posix".
Seemed a very simple change that had little chance of being wrong.

If Windows behaviour is really unsupported then surely the function should not
be available there.
msg55121 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-24 14:26
Further quote: "This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings."

Why shouldn't that feature not be available under Windows?
Again, shlex is *not supposed* to handle Windows command lines.

Also, http://docs.python.org/lib/shlex-parsing-rules.html clearly says what the difference between
"posix=True" and "posix=False" is.

That non-posix mode works on Windows filenames is because it does not handle backslash escapes.
msg55122 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 14:52
But surely it's not named "POSIX mode" for no reason. It's because those rules resemble those of the UNIX shell. While "non-POSIX mode" resemble those of non-POSIX shells, such as DOS.

shlex.split seemed to be a shortcut for those wanting to simply parse a generic quoted string who weren't interested in creating a minilanguage. Surely it should be possible to avoid POSIX rules when doing this on Windows?

You haven't suggested any other way to do this. The fact is, I do want to parse a Windows command line. The only way I have found is by copying shlex.split and hacking it. Didn't seem very nice, especially it seems a fix would be totally trivial, but it's obviously better than starting from scratch.
msg55123 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-24 14:59
The fact remains that shlex is not meant to switch modes automatically depending on the platform.

If you want to contribute a third mode that completely resembles Windows-style parsing,
you're most welcome, just submit a patch and I'll look at it.
msg55124 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 15:17
OK, but surely providing the possibility to override the POSIX flag isn't difficult. That doesn't require any change in default behaviour. 

It also seems it should be possible to request enhancements without having to do the work myself. Can we agree that parsing Windows command lines is a generally useful thing to do? We already have a module that can parse UNIX command lines which isn't unrelated. Perhaps this should be changed to "enhancement" rather than "invalid". 

Whether or not the original author of shlex cared about Windows seems to be beside the point.
msg55125 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-24 15:28
Okay, I'll turn this into a feature request.

BTW, what do you mean with "override the POSIX flag"?
msg55126 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 15:31
I mean changing shlex.split to accept an optional parameter "posix":

def split(s, comments=False, posix=True):
    lex = shlex(s, posix)
    lex.whitespace_split = True
    if not comments:
        lex.commenters = ''
    return list(lex)
msg55127 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-24 16:50
Ah, that is reasonable. I added that parameter in rev. 55549.
msg55128 - (view) Author: Geoffrey Bache (gjb1002) Date: 2007-05-24 17:54
Thanks.

On reflection, I think the docs could do with expanding, particularly on the subject of what non-posix mode actually does and what it's useful for (it's becoming increaingly clear to me that it's nothing like a DOS shell, having used it a bit). The list of rules is all very well but doesn't mean much unless you know why it's like that. There is a cryptic comment about posix=False implying "compatibility mode"  but this concept is not explained.

The fact is that in today's world (now that the Mac is UNIX) "posix=False" more or less means "windows=True". At least unless you document very carefully what it's intended to mean. As you've probably noticed, it confused me thoroughly because I couldn't see what else it could be for, and I still haven't really grasped it.
msg93822 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2009-10-10 06:39
FYI I've implemented a Windows command line parser for use by subprocess 
on Jython, it's available here:

http://fisheye3.atlassian.com/browse/jython/trunk/jython/Lib/subprocess.
py?r=6636#l554

tests:

http://fisheye3.atlassian.com/browse/jython/trunk/jython/Lib/test/test_s
ubprocess_jy.py?r=6464#l41

like shlex, it wasn't built to handle ; && || to join multiple commands 
as #1521950 requests. But other than that it's complete
msg116679 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-09-17 16:50
Already fixed by r55549 and r55550.
msg116681 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-09-17 17:05
No, this feature request has not been satisfied.  Georg fixed some subsidiary issues, but they did not in fact address the feature request for an shlex.split equivalent for Windows.

Since no one has expressed interest in working on this, even though model code has been offered, I'm changing the resolution to languishing rather than reopening it.  Perhaps someone with an itch to scratch will decide to pick it up eventually.
msg116687 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-09-17 17:27
Raymond Chen's blog today discusses CommandLineToArgvW, which is apparently an API that can parse command lines. It's not clear to me if this is actually called by the MSFT CRT:
http://blogs.msdn.com/b/oldnewthing/archive/2010/09/17/10063629.aspx

Here's the documentation for it:
http://msdn.microsoft.com/en-us/library/bb776391%28VS.85%29.aspx

I don't know if we could call this directly (or via ctypes), or if we could emulate it based on the documentation, which doesn't seem very complete.
msg116693 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-09-17 17:38
Now that I think about this some more, we wouldn't want to call this API. I'd rather this hypothetical function be available on non-Windows platforms, so we'd have to implement the semantics of CommandLineToArgvW or whichever CRT we decide to match.
msg130138 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2011-03-06 01:29
The code I linked to above implements those semantics in pure Python. It follows Microsoft's "Parsing C Command-Line Arguments" rules like CommandLineToArgvW does

Here's updated links, the older links seemed to have broken:

https://fisheye3.atlassian.com/browse/jython/trunk/jython/Lib/subprocess.py?r=6636#to566

tests: https://fisheye3.atlassian.com/browse/jython/trunk/jython/Lib/test/test_subprocess_jy.py?r=6464#to41

This code is basically the inverse of subprocess's list2cmdline

I don't mind incorporating this code into the stdlib, but we need to figure out where it would go. There was a discussion on stdlib-sig last year related to this topic, about the need for quoting and unquoting command lines.

We have some of this functionality for posix systems scattered throughout shlex and the pipes module, and then there's subprocess.list2cmdline. I think we could use a new module with all this functionality in one place.
msg141238 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-27 16:03
> But surely it's not named "POSIX mode" for no reason. It's because
> those rules resemble those of the UNIX shell. While "non-POSIX mode"
> resemble those of non-POSIX shells, such as DOS.

Not exactly: when it comes to parsing, shells on POSIX systems don’t always follow the POSIX rules.  The non-POSIX/POSIX modes in shlex mimic that.

In #9723, it was agreed to move the undocumented pipes.quote function into the shlex module.  I think we could move list2commandline from subprocess into shlex too (that’s probably another report, but I’m saying it here because Philip mentioned it), and also provide a Windows-compliant split function in shlex.
History
Date User Action Args
2014-02-03 19:54:10BreamoreBoysetnosy: - BreamoreBoy
2011-07-27 16:03:13eric.araujosetnosy: + titus, pmoore, eric.araujo, janssen, ianbicking

messages: + msg141238
versions: + Python 3.3, - Python 2.7
2011-03-06 01:29:58pjenveysetnosy: georg.brandl, gjb1002, eric.smith, pjenvey, r.david.murray, BreamoreBoy
messages: + msg130138
2010-09-17 17:38:52eric.smithsetmessages: + msg116693
2010-09-17 17:27:05eric.smithsetmessages: + msg116687
2010-09-17 17:05:05r.david.murraysetstatus: closed -> languishing

nosy: + r.david.murray
messages: + msg116681

assignee: georg.brandl ->
resolution: fixed ->
2010-09-17 16:50:20BreamoreBoysetstatus: open -> closed

nosy: + BreamoreBoy
messages: + msg116679

resolution: fixed
2010-05-29 05:18:06eric.smithsetnosy: + eric.smith
2009-10-10 06:39:30pjenveysetnosy: + pjenvey
messages: + msg93822
2009-03-30 16:49:40ajaksu2setassignee: georg.brandl
stage: test needed
components: + Documentation
versions: + Python 2.7
2007-05-24 12:56:34gjb1002create