classification
Title: string str.split() behaviour inconsistency
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: peteshinners, rhettinger, sjoerd
Priority: normal Keywords:

Created on 2004-03-06 20:41 by peteshinners, last changed 2004-03-07 11:00 by sjoerd. This issue is now closed.

Messages (2)
msg20179 - (view) Author: Pete Shinners (peteshinners) (Python committer) Date: 2004-03-06 20:41
The str.split() method behaves differently depending on
if it uses the default (no arguments) separator, or if
you provide your own. There is no way to reproduce the
functionality of the default separator if you supply
your own.

>>> s = "a b  c"
>>> s.split()
['a', 'b', 'c']
>>> s.split(" ")
['a', 'b', '', 'c']

The default split uses a different algorithm, where it
combines multiple separators into a single separator.
Providing a custom separator makes split separate each
individual separator.

Obviously there are good reasons for forcing a separate
entry between each separator. With simple comma or
colon separated records, you want to know if an entry
is blank.

The problem is there is not a way to reproduce the
functionality of the default behavior. This alternate
behavior is also not documented, so it becomes
confusing why split behaves different once you want
your own separators.

Fixing could be a problem. Changing the actual split()
method would break many programs. But adding an
different split is a potentially nice solution.

The other option would be to "re-use" the current
splitfields() function and have it work like the
current split. And change the split() to behave like it
does with no default. This would unfortunately still
"break stuff".

The easiest fix may just be documentation and letting
people know of this difference.

I've been helping some newbies through Python. When
this came up I was a little surprised and we were
forced to learn it was just a little "magic and scary".
msg20180 - (view) Author: Sjoerd Mullender (sjoerd) * (Python committer) Date: 2004-03-07 11:00
Logged In: YES 
user_id=43607

Use s.split(None) to get the same behavior as without arguments.
This is documented behavior, as far as I know.  If the
documentation isn't good enough, submit a bug report
specifically for that.
History
Date User Action Args
2004-03-06 20:41:54peteshinnerscreate