Issue 12911: Expose a private accumulator C API

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57120

classification

Title:	Expose a private accumulator C API
Type:	resource usage	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.2, Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	jcon, loewis, mark.dickinson, pitrou, python-dev, rhettinger
Priority:	normal	Keywords:	patch

Created on 2011-09-06 12:20 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
accu.patch	pitrou, 2011-09-06 16:49		review
accu2.patch	pitrou, 2011-09-07 00:31		review
accu3.patch	pitrou, 2011-10-02 22:55		review

Messages (11)
msg143598 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-09-06 12:20
In 47176e8d7060, I fixed json to not blow memory when serializing a large container of small objects. It turns out that the repr() of tuple objects (and, very likely, list objects and possibly other containers) has the same problem. For example, Martin's 256GB machine can't serialize a two billion-element tuple: http://www.python.org/dev/buildbot/all/builders/AMD64%20debian%20parallel%20custom/builds/6/steps/test/logs/stdio So I propose to expose a private C API for the "accumulator" pattern introduced in 47176e8d7060 (with, e.g., the _PyAccumulator prefix), and to use that API from relevant code.
msg143631 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-09-06 16:48
Here is a patch against 3.2. In the default branch it will also help factor out some code from the _json module.
msg143655 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-09-06 22:48
For the record, the patch fixes the test_bigmem crashes when testing repr() on tuples and lists: http://www.python.org/dev/buildbot/all/builders/AMD64%20debian%20parallel%20custom/builds/10/steps/test/logs/stdio
msg143658 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-09-07 00:31
Updated patch (mostly cosmetic stuff) after Benjamin's comments.
msg143664 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2011-09-07 06:07
I'm -1 on this approach; I don't think yet another container type is the right solution, given that we have already plenty of them. If you want to avoid creating large lists, then the StringIO type should already provide that. So I wonder why these functions couldn't be rewritten to use StringIO. If you really want to use this approach, I'd try to avoid allocating the large list if there are only few substrings. I.e. allocate it only when flushing, and only if the flush is not the final flush.
msg143669 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-09-07 12:23
> I'm -1 on this approach; I don't think yet another container type is > the right solution, given that we have already plenty of them. It's not a container type, just a small C struct that gets allocated on the stack. Think of it as a library, like stringlib. > If you want to avoid creating large lists, then the StringIO type > should already provide that. So I wonder why these functions couldn't > be rewritten to use StringIO. That's another possibility. But we'd have to expose a C API anyway, and this one is as good as any other. Note that StringIO will copy data twice (once when calling write(), once when calling getvalue()), while ''.join() only once (at the end, when concatenating all strings). > If you really want to use this approach, I'd try to avoid allocating > the large list if there are only few substrings. I.e. allocate it only > when flushing, and only if the flush is not the final flush. That's possible, indeed.
msg144789 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-02 22:55
New patch implementing Martin's suggested optimization (only instantiate the large list when necessary).
msg144815 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2011-10-03 11:44
> It's not a container type, just a small C struct that > gets allocated on the stack. Think of it as a library, like stringlib. That's what I call a container type: a structure with a library :-) > That's another possibility. But we'd have to expose a > C API anyway, and this one is as good as any other. No, it's not: it's additional clutter. If new API needs to be added, adding it for existing structures is better. Notice that you don't need new API, as you can use StringIO just fine from C also. > Note that StringIO will copy data twice (once when calling > write(), once when calling getvalue()), while ''.join() only once (at > the end, when concatenating all strings). Sounds like a flaw in StringIO to me. It could also manage a list of strings that have been written, rather than only using a flat buffer. Only when someone actually needs a linear buffer, it could convert it (and use a plain string.join when getvalue is called and there is no buffer at all).
msg144816 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-03 11:58
> > That's another possibility. But we'd have to expose a > > C API anyway, and this one is as good as any other. > > No, it's not: it's additional clutter. If new API needs to be added, > adding it for existing structures is better. Notice that you don't > need new API, as you can use StringIO just fine from C also. Yes, but using StringIO without a dedicated C API is more tedious and quite slower. > > Note that StringIO will copy data twice (once when calling > > write(), once when calling getvalue()), while ''.join() only once (at > > the end, when concatenating all strings). > > Sounds like a flaw in StringIO to me. It could also manage a list of > strings that have been written, rather than only using a flat buffer. > Only when someone actually needs a linear buffer, it could convert it > (and use a plain string.join when getvalue is called and there is no > buffer at all). That's what I thought as well. However, that's probably too much for a bugfix release (and this issue is meant to allow test_bigmem to pass on 3.x).
msg145026 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-10-06 17:13
New changeset f9f782f2369e by Antoine Pitrou in branch '3.2': Issue #12911: Fix memory consumption when calculating the repr() of huge tuples or lists. http://hg.python.org/cpython/rev/f9f782f2369e New changeset 656c13024ede by Antoine Pitrou in branch 'default': Issue #12911: Fix memory consumption when calculating the repr() of huge tuples or lists. http://hg.python.org/cpython/rev/656c13024ede
msg145027 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-06 17:15
I added a comment insisting that the API is private and can be changed at any moment. StringIO can actually re-use that API, rather than the reverse. No need to instantiate a full-blown file object when all you want to do is to join a bunch of strings.

History
Date	User	Action	Args
2022-04-11 14:57:21	admin	set	github: 57120
2011-10-06 17:15:58	pitrou	set	status: open -> closed resolution: fixed messages: + msg145027 stage: patch review -> resolved
2011-10-06 17:13:46	python-dev	set	nosy: + python-dev messages: + msg145026
2011-10-03 11:58:02	pitrou	set	messages: + msg144816
2011-10-03 11:44:13	loewis	set	messages: + msg144815
2011-10-02 22:55:05	pitrou	set	files: + accu3.patch messages: + msg144789
2011-09-17 00:21:58	jcon	set	nosy: + jcon
2011-09-07 12:23:31	pitrou	set	messages: + msg143669
2011-09-07 06:07:49	loewis	set	messages: + msg143664
2011-09-07 00:31:43	pitrou	set	files: + accu2.patch messages: + msg143658
2011-09-06 22:48:01	pitrou	set	messages: + msg143655
2011-09-06 16:49:25	pitrou	set	files: + accu.patch keywords: + patch
2011-09-06 16:48:58	pitrou	set	messages: + msg143631 stage: patch review
2011-09-06 13:33:55	pitrou	set	nosy: + loewis
2011-09-06 12:20:09	pitrou	create