This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shlex doesn't differentiate escaped characters in output
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Matthew Gamble, eric.smith
Priority: normal Keywords:

Created on 2019-05-13 00:59 by Matthew Gamble, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg342276 - (view) Author: Matthew Gamble (Matthew Gamble) Date: 2019-05-13 00:59
The output of the following invocations are exactly the same:

list(shlex.shlex('a ; b', posix=True, punctuation_chars=True))

list(shlex.shlex('a \; b', posix=True, punctuation_chars=True))

They both output the following:

['a', ';', 'b']

This makes it impossible to determine when the user wanted to escape the semi-colon for some reason, such as if they were using find's `-exec` argument.
msg342383 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-13 20:21
The goal is to match posix shell semantics. Can you provide a concrete example where shlex.shlex does something different from a posix-compliant shell? With all the escaping, it's going to be tough.

Note also that your code raises a DeprecationWarning in 3.7, at least, and will be an error in the future. You should probably use r-strings in your examples.
msg342399 - (view) Author: Matthew Gamble (Matthew Gamble) Date: 2019-05-13 23:36
The point is that it's not possible to use the output of shlex.shlex to try to match the behaviour of a POSIX-compliant shell by reliably splitting up a user's input into multiple commands. In the first case I presented (no escape character), the user entered two commands. In the second case, the user entered a single command with two arguments. However, there's no way to differentiate the two situations based on the output of shlex.

It's also worth noting that the output is the same with this too:

list(shlex.shlex('a \\; b', posix=True, punctuation_chars=True))

I tested this code on python 3.6.7 and 3.7.2, and didn't see any deprecation warnings at all. I also checked the history of shlex.py:

https://github.com/python/cpython/commits/master/Lib/shlex.py

The last commit was from 2017, and I don't see any usages of DeprecationWarning inside that file. I'm also not sure how r-strings are relevant, as I don't see any regular expressions used inside of the shlex class.
msg342406 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-14 00:38
Run 3.7 with -Wd:

$ python3 -Wd
Python 3.7.3 (default, Mar 29 2019, 13:03:53)
[GCC 7.4.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 'a \; b'
<stdin>:1: DeprecationWarning: invalid escape sequence \;
'a \\; b'
>>>

The deprecation is in relation to invalid escape sequences, not shlex.

My point is just that you should use r'a \; b' or 'a \\;b', and not rely on invalid escape sequences. For one reason, I can never remember how they're interpreted, and had to look it up. r-strings don't have anything to do with regular expressions per-se, they're a way of changing how python interprets stings, no matter what they're used for.

> The point is that it's not possible to use the output of shlex.shlex to try to match the behaviour of a POSIX-compliant shell by reliably splitting up a user's input into multiple commands. In the first case I presented (no escape character), the user entered two commands. In the second case, the user entered a single command with two arguments. However, there's no way to differentiate the two situations based on the output of shlex.

My question is: can a posix-compliant shell tell the difference? I don't know, it's an honest question. Can you show some shell code where it can tell the difference?
msg342410 - (view) Author: Matthew Gamble (Matthew Gamble) Date: 2019-05-14 01:12
My apologies, I didn't realise you were talking about the invalid escape sequence. Thanks for letting me know about the fact that it's deprecated, I'll definitely be keeping that in mind going forward.

In a bash shell with the find command available, run the following command:

find . -type f -exec ls {} \;

You should see a list of files.

If you run this:

find . -type f -exec ls {} ;

You should see an error message from find:

"find: missing argument to `-exec'"

If I pass the first example in this message to shlex, I get no indication that the user attempted escaped the semi-colon in their input.
History
Date User Action Args
2022-04-11 14:59:15adminsetgithub: 81078
2019-05-14 01:12:01Matthew Gamblesetmessages: + msg342410
2019-05-14 00:38:56eric.smithsetmessages: + msg342406
2019-05-13 23:36:51Matthew Gamblesetmessages: + msg342399
2019-05-13 20:21:17eric.smithsetnosy: + eric.smith
messages: + msg342383
2019-05-13 00:59:51Matthew Gamblecreate