Message285929
>Instead of trying to enumerate all possible wordchars, I think a more robust solution is to use whitespace_split to include *all* characters not otherwise considered special.
I agree with that approach.
Also note that dash/hyphen gets incorrectly tokenized.
>>> import shlex
>>> list(shlex.shlex("mkdir -p somepath"))
['mkdir', '-', 'p', 'somepath']
White listing all valid word characters is not good, because the surrogateescape mechanism can include all sorts of "characters".
In bash:
$ echo mkdir $(echo -ne "Bad\xffButLegalPath")
mkdir Bad?ButLegalPath
the path is one token.
However currently in shlex, it gets broken into multiple tokens:
>>> list(shlex.shlex(b"mkdir Bad\ffButLegalPath".decode("utf-8", "surrogoateescape")))
['mkdir', 'Bad', '\x0c', 'fButLegalPath'] |
|
Date |
User |
Action |
Args |
2017-01-21 00:45:01 | Gustavo Goretkin | set | recipients:
+ Gustavo Goretkin, vinay.sajip, eric.smith, robodan, ned.deily, ezio.melotti, eric.araujo, r.david.murray, cvrebert, python-dev, Andrey.Kislyuk, evan_ |
2017-01-21 00:45:01 | Gustavo Goretkin | set | messageid: <1484959501.15.0.500675331669.issue28595@psf.upfronthosting.co.za> |
2017-01-21 00:45:01 | Gustavo Goretkin | link | issue28595 messages |
2017-01-21 00:45:00 | Gustavo Goretkin | create | |
|