Message 368989 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	Dennis Sweeney, bbayles, rhettinger, serhiy.storchaka, tim.peters
Date	2020-05-16.00:06:59
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1589587620.22.0.435512733113.issue38938@roundup.psfhosted.org>
In-reply-to

Content
The nested generators approach looks promising. I hope you keep working on that :-) I'm not too keen on PR 18427 because it is a massive explosion in code volume and complexity. With some continued effort, I expect we'll get to something much simpler and more maintainable. The existing code already performs well for typical applications, so there is no need to "go gonzo" and throw a massive wall of code at the problem. The ensuing maintenance costs would be too high. To save time and effort, please hold-off on a C implementation until we've agreed to a pure python version of the algorithm. For timings, try using strings, tuples, or dataclass instances. Timing integer inputs isn't representative of how merge() is used and it tends to exaggerate improvements. For interesting merges, the dominant cost is the comparison step. By switching to a binary tree arrangement, the payoff we're aiming for is to avoid the double comparison in the tuple inequality logic — "('a',1)<('b',2)" tests both 'a'=='b' and 'a'<='b'. I suggest aiming for the simplest possible code that avoids the double test.

The nested generators approach looks promising. I hope you keep working on that :-)

I'm not too keen on PR 18427 because it is a massive explosion in code volume and complexity.

With some continued effort, I expect we'll get to something much simpler and more maintainable.  The existing code already performs well for typical applications, so there is no need to "go gonzo" and throw a massive wall of code at the problem.  The ensuing maintenance costs would be too high.

To save time and effort, please hold-off on a C implementation until we've agreed to a pure python version of the algorithm.

For timings, try using strings, tuples, or dataclass instances.  Timing integer inputs isn't representative of how merge() is used and it tends to exaggerate improvements.  For interesting merges, the dominant cost is the comparison step.

By switching to a binary tree arrangement, the payoff we're aiming for is to avoid the double comparison in the tuple inequality logic — "('a',1)<('b',2)" tests both 'a'=='b' and 'a'<='b'.  I suggest aiming for the simplest possible code that avoids the double test.

History
Date	User	Action	Args
2020-05-16 00:07:00	rhettinger	set	recipients: + rhettinger, tim.peters, serhiy.storchaka, bbayles, Dennis Sweeney
2020-05-16 00:07:00	rhettinger	set	messageid: <1589587620.22.0.435512733113.issue38938@roundup.psfhosted.org>
2020-05-16 00:07:00	rhettinger	link	issue38938 messages
2020-05-16 00:06:59	rhettinger	create