This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make str.join auto-convert inputs to strings.
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, EmilStenstrom, ajoino, christian.heimes, eric.smith, gregory.p.smith, gstarck, jack1142, kamilturek, mrabarnett, pablogsal, rhettinger, serhiy.storchaka, terry.reedy, veky, xtreak
Priority: normal Keywords:

Created on 2021-03-18 02:29 by rhettinger, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (21)
msg388983 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-18 02:29
Rather than just erroring-out, it would be nice if str.join converted inputs to strings when needed.

Currently:

    data = [10, 20, 30, 40, 50]
    s = ', '.join(map(str, data))

Proposed:

    s = ', '.join(data)

That would simplify a common idiom.  That is nice win for beginners and it makes code more readable.  

The join() method is unfriendly in a number of ways.  This would make it a bit nicer.

There is likely to be a performance win as well.  The existing idiom with map() roughly runs like this:

     * Get iterator over: map(str, data)
     * Without length knowledge, build-up a list of strings
       periodically resizing and recopying data (1st pass)
     * Loop over the list strings to compute the combined size
       (2nd pass)
     * Allocate a buffer for the target size
     * Loop over the list strings (3rd pass), copying each
       into the buffer and wrap the result in a string object.

But, it could run like this:
     * Use len(data) or a length-hint to presize the list of strings.
     * Loop over the data, converting each input to a string if needed,
       keeping a running total of the target size, and storing in the
       pre-sized list of strings (all this in a single 1st pass)
     * Allocate a buffer for the target size
     * Loop over the list strings (2nd pass), copying each
       into the buffer
     * Loop over the list strings (3rd pass), copying each
       into the buffer and wrap the result in a string object.

AFAICT, the proposal is mostly backwards compatible, the only change is that code that currently errors-out will succeed.

For bytes.join() and bytearray.join(), the only auto-conversion that makes sense is from ints to bytes so that you could write:  

     b' '.join(data)

instead of the current:

    b' '.join([bytes([x]) for x in data])
msg388985 - (view) Author: Vedran Čačić (veky) * Date: 2021-03-18 04:01
I can't find it now, but I seem to remember me having this same proposal (except the part for bytes) quite a few years ago, and you being the most vocal opponent. What changed? Of course, I'm still for it.

(Your second list has fourth item extra. But it's clear what you wanted to say.)
msg388992 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-03-18 07:26
I'm +0.5. Every time this bites me, I apply the same solution, so you're probably right that str.join should just do the work itself. And it's no doubt more performant that way, anyway.

And I've probably got some code that's just waiting for the current behavior to raise an error on me if passed the wrong inputs, even if I'd prefer it to succeed.

I should be +1, but I have a nagging "refuse to guess" feeling. But it doesn't seem like much of a guess: there's no other logical thing I could mean by this code. I'm unlikely to want it to raise an exception, or do any other conversion to a str.
msg388994 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-03-18 07:51
It was proposed by newbies several times before. It was rejected because it would make errors to hide unnoticed. Python is dynamically but strongly typed, and it is its advantage.

I am -1.
msg389006 - (view) Author: Vedran Čačić (veky) * Date: 2021-03-18 10:15
Does strong typing mean you should write

    if bool(condition): ...

or 

    for element in iter(sequence): ...

or (more similar to this)

    my_set.symmetric_difference_update(set(some_iterable))

?

As Eric has said, if there's only one possible thing you could have meant, "strong typing" is just bureaucracy.
msg389048 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-18 23:22
>  What changed?

It comes up almost every week that I teach a Python course.  Eventually, I've come to see the light :-)

Also, I worked though the steps and found an efficiency gain for new code with no detriment to existing code.

Lastly, I used to worry a lot about join() also being defined for bytes() and bytearray().  But after working through the use cases, I can see that we get an even bigger win.  People seem to have a hard time figuring out how to convert a single integer to a byte.  The expression "bytes([x])" isn't at all intuitive; it doesn't look nice in a list comprehension, and is incomprehensible when used with map() and lambda.
msg389106 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2021-03-19 18:16
I'm also -1, for the same reason as Serhiy gave. However, if it was opt-in, then I'd be OK with it.
msg389135 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-20 01:34
I am sympathetic to the 'hiding bugs' argument in general, but what bugs would this proposal hide?  What bugs does print hide by auto-converting non-strings to strings?

I recently had the same thought as Raymond's: "it would be nice if str.join converted inputs to strings when needed."

I have always known that print() is slower in IDLE than in a console.  A recent SO question https://stackoverflow.com/questions/66286367/why-is-my-function-faster-than-pythons-print-function-in-idle showed that it could be 20X slower and asked why?  It turns out that while

print(*values, sep=sep, end=end, file=file) # is equivalent to file.write(sep.join(map(str, values))+end)

print must be implemented as the C equivalent of something like

first=True
for val in values:
    if first:
        first = False
    else
        file.write(sep)
    file.write(str(value))
file.write(end)

When sys.stdout is a screen buffer, the multiple writes effectively implement a join.  But in IDLE, each write(s) results in a separate socket.send(s.encode) and socket.receive).decode + text.insert(s, tag).  I discovered that removing nearly all the overhead from the very slow example with sep.join and end.join made the example only trivially slower on IDLE (5%) than the standard REPL.  In #43283 I added the option of speedups using .join and .format to the IDLE doc, but this workaround would be much more usable if map(str, x) were not needed.
msg389144 - (view) Author: Vedran Čačić (veky) * Date: 2021-03-20 05:31
Matthew: can you then answer the same question I asked Serhiy?

The example usually given when advocating strong typing is whether 2 + '3' should be '23' or 5. Our uneasiness with it doesn't stem from coercions between int and str, but from the fact that + has two distinct meanings.

Of course, binary operators are always like that, even if it's not obvious, since there's always a tension created by difference of types of the left and right operand. Even if it's obvious that 2 - '3' should coerce the second argument to int since str doesn't define -, this can't be a general rule because e.g. set does (what about 2 - {3}?).

But method calls (and many protocols) are _not_ of that kind. As I said above, my_set ^ some_list makes us uneasy (even though list doesn't implement ^), but my_set.symmetric_difference(some_list) doesn't, simply because there is no ambiguity: there is only one thing we could have meant.

The same can be said about "for x in not_an_iterator", or "if not_a_bool".
msg389148 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-03-20 08:31
Vedran, it is not what strong typing means. Strong typing means that '2'+3 is an error instead of '23' or 5. str.join() expects an iterable of strings. If some of items is not a string, it is a sign of programming error. I prefer to get an exception rather of silently conversion of unexpected value to string 'None', '[]' or '<Foo object at 0x12345678>'.

So if you want such feature, it should be separate method or function.

But there is other consideration. Of 721 uses of the join() method (excluding os.path.join()) in the stdlib, only 10 need forceful stringification with map(str, ...). For tests it is 842 to 20, and for Doc/venv/ it is 1388 to 30. I am sure the same ratio is for any other large volume of code. So that feature would actually have very small use - 1-2% of use of str.join().

Specially to Raymond, map(str, ...) is good opportunity to teach about iterators and introduce to itertools.
msg389170 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-20 17:57
> Of 721 uses of the join() method (excluding os.path.join()) 
> in the stdlib, only 10 need forceful stringification with 
> map(str, ...)

Thanks for looking a real world code.  I'm surprised that the standard library stats aren't representative of my experience, perhaps because I tend to write numeric code and do more output formatting than is used internally.
msg389171 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-20 18:00
FWIW, I'm running a user poll on Twitter and have asked people to state their rationale:

    https://twitter.com/raymondh/status/1373315362062626823

Take it with a grain of salt.  Polls totals don't reflect how much thought each person put into their vote.
msg389176 - (view) Author: Emil Stenström (EmilStenstrom) * Date: 2021-03-20 19:55
Since the proposal is fully backwards compatible I don’t think preferring the old version is a reason against this nicer API. After all, people that like the current version can continue using it as they do today. 

Teaching Python to beginners is a great way to find the warts of a language (I’ve done it too). In the beginning people struggle with arrays and if-blocks, and having to go into how map and the str constructor work together to get a comma separated list of ints is just too much. Beginners are an important group of programmers that this proposal will clearly benefit.

I’m sure there will be some “None”-strings that will slip through this, but I think the upside far outweighs the downside in this case.

Big +1 from me.
msg389180 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-20 21:28
I read all the responses as of this timestamp. They left me more persuaded that joining objects with a string (or bytes) is explicit enough that the objects *must* be coerced to strings.

A problem with coercion in "1 + '2'" is that there is no 'must'.  The desired answer could be either 3 or '12', and neither can be converted to the other, so don't guess.

The desired answer for "1 + .5" is much more obviously 1.5 rather than either 1 or 2, plus the former avoids information loss and leaves the option available of rounding or converting however one wants.

One tweet answered my question about masking a bug. Suppose 'words' is intended to be an iterable of strings.

>>> words = ['This', 'is', 'a', 'list', 'of', 7, 'words']  # Buggy
>>> print(*words)  # Auto-coercion masks the bug.
This is a list of 7 words
>>> '-'.join(words)  # Current .join does not.
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    '-'.join(words)
TypeError: sequence item 5: expected str instance, int found

With the proposed change, detection of the bug is delayed, as is already the case with print.  How much do we care about this possibility?  One possible answer is to add a new method, such as 'joins' or builtin function 'join'.

Given the variety of opinions, I think a PEP and SC decision would be needed.
msg389181 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-20 21:43
-10.  I agree with Serhiy.  Automatic type conversion is rarely a feature.  It leads to silent bugs when people pass the wrong things.  Be explicit.

We are intentionally not one of those everything is really a string languages like Perl or Javascript.

This core API behavior change is big enough to need a PEP and steering council approval.
msg389190 - (view) Author: Emil Stenström (EmilStenstrom) * Date: 2021-03-20 22:13
Terry, Gregory: The suggestion is not to change what 1 + "2" does, I fully agree that it behaves at it should. The suggestion is to change what ",".join(1, "2") does. There's no doubt that the intended result is "1, 2". That's why it's possible to coerce.

About the example with a list with mixed types: If the reason that example is buggy is "this list should only have strings", a better way to enforce that is to add types to enforces it.
msg389192 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-20 22:50
There is a lot of doubt.  That should clearly raise an exception because this function is intended to only operate on strings.

Trivial types examples like that gloss over the actual problem.

data_from_some_computations = [b"foo", b"bar"]  # probably returned by a function

... later on, some other place in the code ...

colon_sep_data = ":".join(data_from_some_computations)

I guarantee you that 99.999% of the time everyone wants an exception there instead of their colon_sep_data to contain `b"foo":b"bar"`.

Implicit conversions always lead to hard to pin down bugs.  An exception raised at the source of the problem is very easy to debug in comparison.
msg389213 - (view) Author: Vedran Čačić (veky) * Date: 2021-03-21 03:36
Yes, I know what strong typing means, and can you please read again what I've written? It was exactly about "In the face of ambiguity, refuse the temptation to guess.", because binary operators are inherently ambiguous when given differently typed operands. Methods are not: the method _name_ itself is resolved according to self's type, it seems obvious to me that the arguments should too. Otherwise "explicit fanatics" would probably want to write list.append(things, more) instead of things.append(more).

The only reason we're having this conversation is that when it was introduced, `join` was a function, not a method. If it were a method from the start, we would've never even questioned its stringification of the iterable elements (and of course it would do that from the start, cf. set or dict update methods).

Gregory: yes, `bytes` elements are a problem, but that's a completely orthogonal problem (probably best left for linters). The easiest way to see it: do you object to (the current behavior of)

>>> s = {2, 7}
>>> s.update(b'Veky')

? :-)
msg389228 - (view) Author: Grégory Starck (gstarck) * Date: 2021-03-21 13:53
FWIW -1 from me too.

That should be solved by creating a new function IMO : 

def joinstr(sep, *seq):
    return sep.join(str(i) for i in seq)
msg389241 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-03-21 16:06
I'm also -1 and would prefer something like Grégory's proposal instead.
msg389410 - (view) Author: Jacob Nilsson (ajoino) Date: 2021-03-23 21:19
For what my opinion is worth, I agree with Grégory's suggestion because the ',' part of ','.join(...) is almost as unintuitive as the problems Raymond's suggestions are trying to fix.

I was going to suggest a builtin to work on both str and bytes, like join(sep=None, strtype=str, *strings) but that interface looks pretty bad...

I think joinstr/joinbytes according to Grégory's suggestion (perhaps as classmethods of str/bytes?) would make the most sense.
History
Date User Action Args
2022-04-11 14:59:42adminsetgithub: 87701
2021-03-23 21:21:24rhettingersetstatus: open -> closed
stage: resolved
2021-03-23 21:19:14ajoinosetnosy: + ajoino
messages: + msg389410
2021-03-21 16:06:46christian.heimessetnosy: + christian.heimes
messages: + msg389241
2021-03-21 13:53:05gstarcksetnosy: + gstarck
messages: + msg389228
2021-03-21 12:43:49kamiltureksetnosy: + kamilturek
2021-03-21 03:36:58vekysetmessages: + msg389213
2021-03-20 22:50:58gregory.p.smithsetmessages: + msg389192
2021-03-20 22:13:48EmilStenstromsetmessages: + msg389190
2021-03-20 21:43:31gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg389181
2021-03-20 21:28:24terry.reedysetmessages: + msg389180
2021-03-20 19:55:54EmilStenstromsetnosy: + EmilStenstrom
messages: + msg389176
2021-03-20 18:00:24rhettingersetmessages: + msg389171
2021-03-20 17:57:52jack1142setnosy: + jack1142
2021-03-20 17:57:23rhettingersetmessages: + msg389170
2021-03-20 17:15:46BTaskayasetnosy: + BTaskaya
2021-03-20 08:31:34serhiy.storchakasetmessages: + msg389148
2021-03-20 05:31:39vekysetmessages: + msg389144
2021-03-20 01:34:45terry.reedysetnosy: + terry.reedy
messages: + msg389135
2021-03-19 18:16:57mrabarnettsetnosy: + mrabarnett
messages: + msg389106
2021-03-18 23:22:39rhettingersetmessages: + msg389048
2021-03-18 10:15:14vekysetmessages: + msg389006
2021-03-18 07:51:50serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg388994
2021-03-18 07:26:45eric.smithsetnosy: + eric.smith
messages: + msg388992
2021-03-18 04:01:41vekysetnosy: + veky
messages: + msg388985
2021-03-18 03:02:22xtreaksetnosy: + xtreak
2021-03-18 02:29:06rhettingercreate