Message 282923 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	barry
Recipients	barry
Date	2016-12-11.15:11:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1481469102.15.0.255147638686.issue28937@psf.upfronthosting.co.za>
In-reply-to

Content
This has finally bugged me enough to file an issue, although I wouldn't be able to use it until Python 3.7. There's a subtle but documented difference in str.split() when sep=None: >>> help(''.split) Help on built-in function split: split(...) method of builtins.str instance S.split(sep=None, maxsplit=-1) -> list of strings Return a list of the words in S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result. I.e., that empty strings are removed from the result. This does not happen when sep is given, leading to this type of unfortunate code: >>> 'foo,bar,baz'.split(',') ['foo', 'bar', 'baz'] >>> 'foo,bar,baz'.replace(',', ' ').split() ['foo', 'bar', 'baz'] >>> ''.split(',') [''] >>> ''.replace(',', ' ').split() [] Specifically, code that wants to split on say commas, but has to handle the case where the source string is empty, shouldn't have to also filter out the single empty string item. Obviously we can't change existing behavior, so I propose to add a keyword argument `prune` that would make these two bits of code identical: >>> ''.split() [] >>> ''.split(' ', prune=True) [] and would handle the case of ''.split(',') without having to resort to creating an ephemeral intermediate string. `prune` should be a keyword-only argument, defaulting to False.

This has finally bugged me enough to file an issue, although I wouldn't be able to use it until Python 3.7.  There's a subtle but documented difference in str.split() when sep=None:

>>> help(''.split)
Help on built-in function split:

split(...) method of builtins.str instance
    S.split(sep=None, maxsplit=-1) -> list of strings
    
    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.

I.e., that empty strings are removed from the result.  This does not happen when sep is given, leading to this type of unfortunate code:

>>> 'foo,bar,baz'.split(',')
['foo', 'bar', 'baz']
>>> 'foo,bar,baz'.replace(',', ' ').split()
['foo', 'bar', 'baz']
>>> ''.split(',')
['']
>>> ''.replace(',', ' ').split()
[]

Specifically, code that wants to split on say commas, but has to handle the case where the source string is empty, shouldn't have to also filter out the single empty string item.

Obviously we can't change existing behavior, so I propose to add a keyword argument `prune` that would make these two bits of code identical:

>>> ''.split()
[]
>>> ''.split(' ', prune=True)
[]

and would handle the case of ''.split(',') without having to resort to creating an ephemeral intermediate string.

`prune` should be a keyword-only argument, defaulting to False.

History
Date	User	Action	Args
2016-12-11 15:11:42	barry	set	recipients: + barry
2016-12-11 15:11:42	barry	set	messageid: <1481469102.15.0.255147638686.issue28937@psf.upfronthosting.co.za>
2016-12-11 15:11:42	barry	link	issue28937 messages
2016-12-11 15:11:41	barry	create