This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: join method for list and tuple
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Javier Dehesa, christian.heimes, eric.araujo, iamsav, josh.r, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-04-03 14:33 by Javier Dehesa, last changed 2022-04-11 14:58 by admin.

Messages (9)
msg314881 - (view) Author: Javier Dehesa (Javier Dehesa) Date: 2018-04-03 14:33
It is pretty trivial to concatenate a sequence of strings:

    ''.join([str1, str2, ...])

Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include:

    sum([lst1, lst2, ...], [])
    [x for y [lst1, lst2, ...] for x in y]
    list(itertools.chain(lst1, lst2, ...))

The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches.

If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax:

    [].join([lst1, lst2, ...])

And while we are at it:

    ().join([tpl1, tpl2, ...])

Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]:

lsts = [lst1, lst2, ...]
joiner = [None, 'STOP', None]
lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)]

Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).
msg314882 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-04-03 14:40
join() is a bad choice, because new developers will confusing list.join with str.join.

We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator:

>>> l = []
>>> l.extend(itertools.chain([1], [2], [3]))
>>> l
[1, 2, 3]
msg314883 - (view) Author: Javier Dehesa (Javier Dehesa) Date: 2018-04-03 15:06
Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases.

I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong.
msg314885 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-03 15:23
String concatenation: f'{a}{b}{c}'
List concatenation: [*a, *b, *c]
Tuple concatenation: (*a, *b, *c)
Set union: {*a, *b, *c}
Dict merging: {**a, **b, **c}
msg352387 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-09-13 18:35
Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where:

    [*seq for seq in allsequences]

is allowed.

    list(itertools.chain.from_iterable(allsequences))

handles that just fine, but I could definitely see it being convenient to be able to do:

    [].join(allsequences)

That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.:

    # For not-really-csv-but-people-do-it-anyway
    ','.join(row_strings)

    # Separate words with spaces
    ' '.join(words)

    # Separate lines with newlines
    '\n'.join(lines)

I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.:

    list.concat(allsequences)

which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own).

Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.
msg352530 - (view) Author: Александр Семенов (iamsav) Date: 2019-09-16 09:33
in javascript join() is made the other way around
['1','2','3'].join(', ')
so, [].join() may confuse some peoples.
msg352531 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-09-16 09:46
> in javascript join() is made the other way around
> ['1','2','3'].join(', ')
> so, [].join() may confuse some peoples.

It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.
msg352532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-16 09:53
How common is the case of variable number of things to concatenate/union/merge?

From my experience, in most ceases this looks like:

    result = []
    for ...:
        # many complex statements
        # may include continue and break
        result.extend(items) # may be intermixed with result.append(item)

So concatenating purely lists from some sequence is very special case. And there are several ways to perform it.

    result = []
    for items in seq:
        result.extend(items)
        # nothing wrong with this simple code, really

    result = [x for items in seq for x in items]
    # may be less effective for really long sublists,
    # but looks simple

    result = list(itertools.chain.from_iterable(items))
    # if you are itertools addictive ;-)
msg352534 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-16 10:04
It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6ca122f814dabdc40178c7b9656caf729c2).
History
Date User Action Args
2022-04-11 14:58:59adminsetgithub: 77395
2019-09-16 10:04:49serhiy.storchakasetmessages: + msg352534
2019-09-16 09:53:43serhiy.storchakasetmessages: + msg352532
2019-09-16 09:46:11christian.heimessetmessages: + msg352531
2019-09-16 09:33:29iamsavsetnosy: + iamsav
messages: + msg352530
2019-09-13 18:35:07josh.rsetnosy: + josh.r
messages: + msg352387
2018-04-06 16:29:01eric.araujosetnosy: + eric.araujo
2018-04-03 15:23:56serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg314885
2018-04-03 15:06:33Javier Dehesasetmessages: + msg314883
2018-04-03 14:40:42christian.heimessetnosy: + christian.heimes

messages: + msg314882
versions: + Python 3.8
2018-04-03 14:33:53Javier Dehesacreate