This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shlex.split inserts extra item on backslash space space
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, max, peter.otten
Priority: normal Keywords:

Created on 2019-01-20 10:23 by max, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg334081 - (view) Author: Max (max) * Date: 2019-01-20 10:23
I believe in both cases below, the ouptu should be ['a', 'b']; the extra ' ' inserted in the list is incorrect:

python3.6
Python 3.6.2 (default, Aug  4 2017, 14:35:04)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shlex
>>> shlex.split('a \ b')
['a', ' b']
>>> shlex.split('a \  b')
['a', ' ', 'b']
>>>

Doc reference: https://docs.python.org/3/library/shlex.html#parsing-rules
> Non-quoted escape characters (e.g. '\') preserve the literal value of the next character that follows;

I believe this implies that backslash space should be just space; and then two adjacent spaces should be used (just like a single space) as a separator between arguments.
msg334082 - (view) Author: Peter Otten (peter.otten) * Date: 2019-01-20 10:33
To me the current shlex behaviour makes sense, and the shell (tested with bash) behaves the same way:

$ python3 -c 'import sys; print(sys.argv)' a   b
['-c', 'a', 'b']
$ python3 -c 'import sys; print(sys.argv)' a \  b
['-c', 'a', ' ', 'b']
msg334091 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-01-20 15:27
I agree that the current behavior makes sense. I think "preserve the literal value of the next character" means the space won't be interpreted as a separator.

In the first example (I think better written as shlex.split(r'a \ b')), the first space is a separator. The second space is not a separator because of the backslash, so it's part of the second token ' b'.

In the second example (shlex.split(r'a \  b')), the first space is a separator, the second space is not a separator because of the backslash, and the third space is a separator. This explains why there's no space before the 'b'.

I assume peter.otten's example is bash. I can confirm with zsh:

[~]$ python3 -c 'import sys; print(sys.argv)' a   b
['-c', 'a', 'b']
[~]$ python3 -c 'import sys; print(sys.argv)' a \ b
['-c', 'a', ' b']
[~]$ python3 -c 'import sys; print(sys.argv)' a \  b
['-c', 'a', ' ', 'b']

I'm going to close this. But anyone wants to suggest a documentation patch, feel free to reopen this.

Also, changing this would no doubt break some code, so I'd recommend against changing it even if I didn't think it was doing the right thing.
History
Date User Action Args
2022-04-11 14:59:10adminsetgithub: 79968
2019-01-20 15:27:33eric.smithsetstatus: open -> closed

type: behavior

nosy: + eric.smith
messages: + msg334091
resolution: not a bug
stage: resolved
2019-01-20 10:33:03peter.ottensetnosy: + peter.otten
messages: + msg334082
2019-01-20 10:23:15maxcreate