Message 285932 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	evan_
Recipients	Andrey.Kislyuk, Gustavo Goretkin, cvrebert, eric.araujo, eric.smith, evan_, ezio.melotti, ned.deily, python-dev, r.david.murray, robodan, vinay.sajip
Date	2017-01-21.02:13:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1484964810.47.0.215851768915.issue28595@psf.upfronthosting.co.za>
In-reply-to

Content
Unfortunately shlex.shlex's defaults are probably going to remain that way for a long time in order to avoid breaking backwards compatibility. Presumably shlex.split was added so you didn't have to remember to set posix and whitespace_split yourself. The particular problem I'm addressing in this issue is that the new punctuation_chars argument doesn't currently work with whitespace_split. >>> def split(text, ws=False, pc=False): ... s = shlex.shlex(text, posix=True, punctuation_chars=pc) ... s.whitespace_split = ws ... return list(s) ... >>> split('foo,bar>baz') ['foo', ',', 'bar', '>', 'baz'] >>> split('foo,bar>baz', ws=True) ['foo,bar>baz'] >>> split('foo,bar>baz', pc=True) ['foo', ',', 'bar', '>', 'baz'] >>> split('foo,bar>baz', ws=True, pc=True) ['foo,bar>baz'] With my patch, the last example outputs ['foo,bar', '>', 'baz']. Before the release of 3.6 I was arguing that punctuation_chars should not attempt to augment wordchars at all, since the idea of wordchars is inherently incorrect as you point out. Now I think it's too late to change, hence my patch treats this as a new feature in 3.7.

Unfortunately shlex.shlex's defaults are probably going to remain that way for a long time in order to avoid breaking backwards compatibility. Presumably shlex.split was added so you didn't have to remember to set posix and whitespace_split yourself.

The particular problem I'm addressing in this issue is that the new punctuation_chars argument doesn't currently work with whitespace_split.

>>> def split(text, ws=False, pc=False):
...     s = shlex.shlex(text, posix=True, punctuation_chars=pc)
...     s.whitespace_split = ws
...     return list(s)
...
>>> split('foo,bar>baz')
['foo', ',', 'bar', '>', 'baz']
>>> split('foo,bar>baz', ws=True)
['foo,bar>baz']
>>> split('foo,bar>baz', pc=True)
['foo', ',', 'bar', '>', 'baz']
>>> split('foo,bar>baz', ws=True, pc=True)
['foo,bar>baz']

With my patch, the last example outputs ['foo,bar', '>', 'baz'].

Before the release of 3.6 I was arguing that punctuation_chars should not attempt to augment wordchars at all, since the idea of wordchars is inherently incorrect as you point out. Now I think it's too late to change, hence my patch treats this as a new feature in 3.7.

History
Date	User	Action	Args
2017-01-21 02:13:30	evan_	set	recipients: + evan_, vinay.sajip, eric.smith, robodan, ned.deily, ezio.melotti, eric.araujo, r.david.murray, cvrebert, python-dev, Andrey.Kislyuk, Gustavo Goretkin
2017-01-21 02:13:30	evan_	set	messageid: <1484964810.47.0.215851768915.issue28595@psf.upfronthosting.co.za>
2017-01-21 02:13:30	evan_	link	issue28595 messages
2017-01-21 02:13:29	evan_	create