This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Gustavo Goretkin
Recipients Andrey.Kislyuk, Gustavo Goretkin, cvrebert, eric.araujo, eric.smith, evan_, ezio.melotti, ned.deily, python-dev, r.david.murray, robodan, vinay.sajip
Date 2017-01-21.00:45:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1484959501.15.0.500675331669.issue28595@psf.upfronthosting.co.za>
In-reply-to
Content
>Instead of trying to enumerate all possible wordchars, I think a more robust solution is to use whitespace_split to include *all* characters not otherwise considered special.

I agree with that approach.

Also note that dash/hyphen gets incorrectly tokenized.

>>> import shlex
>>> list(shlex.shlex("mkdir -p somepath"))
['mkdir', '-', 'p', 'somepath']

White listing all valid word characters is not good, because the surrogateescape mechanism can include all sorts of "characters".

In bash:

$ echo mkdir $(echo -ne "Bad\xffButLegalPath")
mkdir Bad?ButLegalPath

the path is one token.

However currently in shlex, it gets broken into multiple tokens:

>>> list(shlex.shlex(b"mkdir Bad\ffButLegalPath".decode("utf-8", "surrogoateescape")))
['mkdir', 'Bad', '\x0c', 'fButLegalPath']
History
Date User Action Args
2017-01-21 00:45:01Gustavo Goretkinsetrecipients: + Gustavo Goretkin, vinay.sajip, eric.smith, robodan, ned.deily, ezio.melotti, eric.araujo, r.david.murray, cvrebert, python-dev, Andrey.Kislyuk, evan_
2017-01-21 00:45:01Gustavo Goretkinsetmessageid: <1484959501.15.0.500675331669.issue28595@psf.upfronthosting.co.za>
2017-01-21 00:45:01Gustavo Goretkinlinkissue28595 messages
2017-01-21 00:45:00Gustavo Goretkincreate