Message 395193 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	andrei.avk
Recipients	Catherine.Devlin, Mark.Bell, Philippe Cloutier, ZackerySpytz, andrei.avk, barry, cheryl.sabella, corona10, gvanrossum, karlcow, mrabarnett, serhiy.storchaka, syeberman, veky
Date	2021-06-06.01:22:50
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1622942570.81.0.176156393502.issue28937@roundup.psfhosted.org>
In-reply-to

Content
> I imagine that the discussion focussed on this since this is precisely what happens when sep=None. For example, 'a b c '.split() == ['a', 'b', 'c']. I guess that the point was to provide users with explicit, manual control over whether the behaviour of split should drop all empty strings or retain all empty strings instead of this decision just being made on whether sep is None or not. That's true on some level but it seems to me that it's somewhat more nuanced than that. The intent of sep=None is not to remove empties but to collapse invisible whitespace of mixed types into a single separator. ' \t ' probably means a single separator because it looks like one visually. Yes, the effect is the same as removing empties but it's a relevant distinction when designing (and naming) a flag to make split() consistent with this behaviour when sep is ',', ';', etc. Because when you have 'a,,,' - the most likely intent is to have 3 empty values, NOT to collapse 3 commas into a single sep; - and then you might potentially have additional processing that gets rid of empties, as part of split() operation. So it's quite a different operation, even though the end effect is the same. So is this change really making the behaviour consistent? To me, consistency implies that intent is roughly the same, and outcome is also roughly the same. You might say, but: practicality beats purity? However, there are some real issues here: - harder to explain, remember, document. - naming issue - not completely solving the initial issue (and it would most likely leave no practical way to patch up that corner case if this PR is accepted) Re: naming, for example, using keep_empty=False for sep=None is confusing, - it would seem that most (or even all) users would think of the operation as collapsing contiguous mixed whitespace into a single separator rather than splitting everything up and then purging empties. So this name could cause a fair bit of confusion for this case. What if we call it `collapse_contiguous_separators`? I can live with an awkward name, but even then it doesn't work for the case like 'a,,,,' -- it doesn't make sense (mostly) to collapse 4 commas into one separator. Here you are actually purging empty values. So the consistency seems labored in that any name you pick would be confusing for some cases. And is the consistency for this case really needed? Is it common to have something like 'a,,,,' and say "I wish to get rid of those empty values but I don't want to use filter(None, values)"? In regard to the workaround you suggested, that seems fine. If this PR is accepted, any of the workarounds that people now use for ''.split(',') or similar would still work just as before..

> I imagine that the discussion focussed on this since this is precisely what happens when sep=None. For example, 'a b c '.split() == ['a', 'b', 'c']. I guess that the point was to provide users with explicit, manual control over whether the behaviour of split should drop all empty strings or retain all empty strings instead of this decision just being made on whether sep is None or not.

That's true on some level but it seems to me that it's somewhat more nuanced than that.

The intent of sep=None is not to remove empties but to collapse invisible whitespace of mixed types into a single separator. ' \t ' probably means a single separator because it looks like one visually. Yes, the effect is the same as removing empties but it's a relevant distinction when designing (and naming) a flag to make split() consistent with this behaviour when sep is ',', ';', etc.

Because when you have 'a,,,' - the most likely intent is to have 3 empty values, NOT to collapse 3 commas into a single sep; - and then you might potentially have additional processing that gets rid of empties, as part of split() operation. So it's quite a different operation, even though the end effect is the same. So is this change really making the behaviour consistent? To me, consistency implies that intent is roughly the same, and outcome is also roughly the same.

You might say, but: practicality beats purity?

However, there are some real issues here:

- harder to explain, remember, document.
- naming issue
- not completely solving the initial issue (and it would most likely leave no practical way to patch up that corner case if this PR is accepted)

Re: naming, for example, using keep_empty=False for sep=None is confusing, - it would seem that most (or even all) users would think of the operation as collapsing contiguous mixed whitespace into a single separator rather than splitting everything up and then purging empties. So this name could cause a fair bit of confusion for this case.

What if we call it `collapse_contiguous_separators`? I can live with an awkward name, but even then it doesn't work for the case like 'a,,,,' -- it doesn't make sense (mostly) to collapse 4 commas into one separator. Here you are actually purging empty values.

So the consistency seems labored in that any name you pick would be confusing for some cases.

And is the consistency for this case really needed? Is it common to have something like 'a,,,,' and say "I wish to get rid of those empty values but I don't want to use filter(None, values)"?

In regard to the workaround you suggested, that seems fine. If this PR is accepted, any of the workarounds that people now use for ''.split(',') or similar would still work just as before..

History
Date	User	Action	Args
2021-06-06 01:22:50	andrei.avk	set	recipients: + andrei.avk, gvanrossum, barry, syeberman, mrabarnett, karlcow, serhiy.storchaka, Catherine.Devlin, Mark.Bell, veky, cheryl.sabella, corona10, ZackerySpytz, Philippe Cloutier
2021-06-06 01:22:50	andrei.avk	set	messageid: <1622942570.81.0.176156393502.issue28937@roundup.psfhosted.org>
2021-06-06 01:22:50	andrei.avk	link	issue28937 messages
2021-06-06 01:22:50	andrei.avk	create