Author terry.reedy
Recipients Sergey, jcea, serhiy.storchaka, terry.reedy
Date 2013-06-28.19:32:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1372447980.14.0.769605833246.issue18305@psf.upfronthosting.co.za>
In-reply-to
Content
Performance enhancements do not normally go in bugfix releases. The issue of quadratic performance of sum(sequences, null_seq) is known, which is why the doc says that sum() is for numbers and recommends .join for strings and itertools.chain for other sequences.

sum([[1,2,3]]*n, []) == [1,2,3]*n == list(chain.from_iterable([[1,2,3]]*n))

For n = 1000000, the second takes a blink of an eye and the third under a second. So there is no issue for properly written code. More generally,

sum(listlist, []) == list(chain.from_iterable(listlist))

The latter should be comparable in speed to your patch and has the advantage of not turning the iterator into a concrete list unless and until one actually needs a concrete list.

There are two disadvantages to doing the equivalent within sum:

1. People *will* move code that depends on the internal optimization to pythons that do not have it. And they *will* complain about their program 'freezing'. This already happened when the equivalent of str.join was built into sum. It is better to use 'proper' code that will work well enough on all CPython versions and other implementations.

2. It discourages people from carefully thinking about whether they actually need a concrete list or merely the iterator for a virtual list. The latter work for sequences that are too long to fit in memory.

So my inclination is to reject the change.
History
Date User Action Args
2013-06-28 19:33:00terry.reedysetrecipients: + terry.reedy, jcea, serhiy.storchaka, Sergey
2013-06-28 19:33:00terry.reedysetmessageid: <1372447980.14.0.769605833246.issue18305@psf.upfronthosting.co.za>
2013-06-28 19:33:00terry.reedylinkissue18305 messages
2013-06-28 19:32:59terry.reedycreate