Message 279980 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	evan_
Recipients	Andrey.Kislyuk, cvrebert, eric.araujo, eric.smith, evan_, ezio.melotti, python-dev, r.david.murray, robodan, vinay.sajip
Date	2016-11-03.09:42:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1478166136.94.0.995190174918.issue28595@psf.upfronthosting.co.za>
In-reply-to

Content
The changes to shlex due to land in 3.6 use a predefined set of characters to "augment" wordchars, however this set is incomplete. For example, 'foo,bar' should be parsed as a single token, but it is split on the comma: $ echo foo,bar foo,bar >>> import shlex >>> list(shlex.shlex('foo,bar', punctuation_chars=True)) ['foo', ',', 'bar'] (For context on where this was encountered, see https://github.com/kislyuk/argcomplete/issues/161) Instead of trying to enumerate all possible wordchars, I think a more robust solution is to use whitespace_split to include all characters not otherwise considered special. Ideally this would be fixed before 3.6 is released to avoid needing to maintain backwards compatibility with the current behaviour, although I understand the timeline may make this difficult. I've attached a patch with proposed changes, including updates to the tests to demonstrate the effective difference. I can make the corresponding documentation changes if we want this merged. (I've added everyone to the nosy list from http://bugs.python.org/issue1521950 where these changes originated.)

The changes to shlex due to land in 3.6 use a predefined set of characters to "augment" wordchars, however this set is incomplete. For example, 'foo,bar' should be parsed as a single token, but it is split on the comma:

$ echo foo,bar
foo,bar

>>> import shlex
>>> list(shlex.shlex('foo,bar', punctuation_chars=True))
['foo', ',', 'bar']

(For context on where this was encountered, see https://github.com/kislyuk/argcomplete/issues/161)

Instead of trying to enumerate all possible wordchars, I think a more robust solution is to use whitespace_split to include *all* characters not otherwise considered special.

Ideally this would be fixed before 3.6 is released to avoid needing to maintain backwards compatibility with the current behaviour, although I understand the timeline may make this difficult.

I've attached a patch with proposed changes, including updates to the tests to demonstrate the effective difference. I can make the corresponding documentation changes if we want this merged.

(I've added everyone to the nosy list from http://bugs.python.org/issue1521950 where these changes originated.)

History
Date	User	Action	Args
2016-11-03 09:42:17	evan_	set	recipients: + evan_, vinay.sajip, eric.smith, robodan, ezio.melotti, eric.araujo, r.david.murray, cvrebert, python-dev, Andrey.Kislyuk
2016-11-03 09:42:16	evan_	set	messageid: <1478166136.94.0.995190174918.issue28595@psf.upfronthosting.co.za>
2016-11-03 09:42:16	evan_	link	issue28595 messages
2016-11-03 09:42:16	evan_	create