This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shlex.quote doesn't work on bytestrings
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: HassanAbouelela, Jonas Thiem, Nan Wu, The Compiler, aldwinaldwin, cheryl.sabella, hrik2001, martin.panter, r.david.murray, techfixya, willingc, xtreak
Priority: normal Keywords: easy, patch

Created on 2015-11-06 12:52 by Jonas Thiem, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
shlex_quote_bytes_support.patch Nan Wu, 2015-11-10 03:27
Pull Requests
URL Status Linked Edit
PR 10871 closed python-dev, 2018-12-03 18:57
PR 22657 open HassanAbouelela, 2020-10-12 00:42
Messages (9)
msg254186 - (view) Author: Jonas Thiem (Jonas Thiem) Date: 2015-11-06 12:52
Demonstration:

>>> import shlex
>>> shlex.quote(b"abc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.4/shlex.py", line 285, in quote
    if _find_unsafe(s) is None:
TypeError: can't use a string pattern on a bytes-like object
>>>

Your question is now probably, why would anyone not want to use unicode strings here?

The reason is that for some operations (e.g. file access to some known paths) decoding and encoding from/to any sort of unicode interpretation can be lossy, specifically when the file path on the filesystem has broken/mixed encoding characters. In such a case, the shell command might need to be supplied as bytestring to ensure it is sent exactly as-is so such broken files can still be dealt with, without the Unicode interpretation possibly deforming the path in some bytes.

Since shlex.quote seems targeted at shell usage, it should therefore support this.
msg254196 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-06 13:58
I think that this is a reasonable request, and probably applies to the whole shlex module, although less strongly.

You could use the surrogateescape hack to work around the problem:

  shlex.quote(mydata.encode('ascii', 'surrogateescape')).decode('ascii', 'surrogateescape)

That might be the only practical way to handle bytes input to the shlex parser, if we do also want to tackle that.

Note that it is already the case that os module functions that retrn filenames and stdin/stdout use surrogateescape, so a naive program may actually work with binary filenames (which is why the handler is used in those contexts).
msg254429 - (view) Author: Nan Wu (Nan Wu) * Date: 2015-11-10 03:27
Added a patch for support this in `quote` method. What is a good example or a group of examples to demonstrate the usage in the document?
msg256413 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-14 20:06
I think the documentation needs a “Changed in version 3.6” notice
msg326163 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-23 16:13
Thanks for the patch since the current workflow uses GitHub PR the patch can be made as a PR to move it forward. It seems there are some conflicts as I tried to apply the attached patch against latest master.

Thanks
msg345569 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-06-14 09:56
Python 3.9.0a0
[GCC 7.3.0] on linux
>>> import re
>>> find_unsafe_bytes = re.compile(b'[^\w@%+=:,./-]').search
<stdin>:1: SyntaxWarning: invalid escape sequence \w

when removing \w, all the tests pass

(my regex knowledge is close to None.)

"\w stands for "word character". It always matches the ASCII characters [A-Za-z0-9_]"

replace \w with A-Za-z0-9_ ?? (all the tests pass)
msg370274 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2020-05-28 23:39
The first pull request has been closed, so this issue is available to be worked on.  If the original patch or PR are used, please credit the original authors.  Thanks!
msg385605 - (view) Author: techfixya (techfixya) Date: 2021-01-25 07:15
How to Install Brother mfc-l2740dw driver on Windows
https://techfixya.com/how-to-install-brother-mfc-l2740dw-driver-on-windows/
msg391986 - (view) Author: Shatabarto Bhattacharya (hrik2001) Date: 2021-04-26 21:09
Looks like this issue has been solved?
What is there to be worked on?
History
Date User Action Args
2022-04-11 14:58:23adminsetgithub: 69753
2021-04-26 21:09:05hrik2001setnosy: + hrik2001
messages: + msg391986
2021-01-25 07:15:34techfixyasetnosy: + techfixya
messages: + msg385605
2020-10-12 00:42:09HassanAbouelelasetnosy: + HassanAbouelela
pull_requests: + pull_request21634
2020-05-28 23:39:20cheryl.sabellasetnosy: + cheryl.sabella

messages: + msg370274
versions: + Python 3.10, - Python 3.6
2019-06-14 09:56:39aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg345569
2018-12-03 18:57:36python-devsetpull_requests: + pull_request10106
2018-09-23 16:13:17xtreaksetnosy: + xtreak
messages: + msg326163
2015-12-14 20:06:43martin.pantersetnosy: + martin.panter
messages: + msg256413
2015-12-14 19:20:04willingcsetnosy: + willingc

stage: needs patch -> patch review
2015-11-10 03:27:55Nan Wusetfiles: + shlex_quote_bytes_support.patch

nosy: + Nan Wu
messages: + msg254429

keywords: + patch
2015-11-06 13:58:12r.david.murraysettype: behavior -> enhancement
versions: + Python 3.6, - Python 3.4
keywords: + easy
nosy: + r.david.murray

messages: + msg254196
stage: needs patch
2015-11-06 12:52:23Jonas Thiemcreate