Issue 1792: o(n*n) marshal.dumps performance for largish objects with patch

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46125

classification

Title:	o(n*n) marshal.dumps performance for largish objects with patch
Type:	performance	Stage:
Components:	Interpreter Core	Versions:	Python 2.6

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	aaron_watters, akuchling, christian.heimes, djmitche, facundobatista, rhettinger
Priority:	high	Keywords:	easy, patch

Created on 2008-01-11 14:21 by aaron_watters, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
marshal.diff	aaron_watters, 2008-01-11 14:21	proposed patch to improve dumps performance
unnamed	aaron_watters, 2008-01-11 15:14
unnamed	aaron_watters, 2008-01-11 16:21
marshal_limited.diff	facundobatista, 2008-01-25 17:58	same idea but with a limit

Pull Requests
URL	Status	Linked	Edit
PR 6992	closed	mbussonn, 2018-05-19 22:31

Messages (10)
msg59710 - (view)	Author: Aaron Watters (aaron_watters)	Date: 2008-01-11 14:21
Much to my surprise I found that one of my applications seemed to be running slow as a result of marshal.dumps. I think the culprit is the w_more(...) function, which grows the marshal buffer in 1k units. This means that a marshal of size 100k will have 100 reallocations and string copies. Other parts of Python (and java etc) have a proportional reallocation strategy which reallocates a new size based on the existing size. This mean a 100k marshal requires just 5 or so reallocations and string copies (n log n versus n**2 asymptotic performance). I humbly submit the following patch (based on python 2.6a0 source). I think it is a strict improvement on the existing code, but I've been wrong before (twice ;)). -- Aaron Watters
msg59711 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-01-11 14:25
Why not just double the size? The "doubling + 1024" address some specific issue? If so, it should be commented. Also, do you have an example of a marshal.dumps() that suffers from this issue? Thank you!
msg59715 - (view)	Author: Aaron Watters (aaron_watters)	Date: 2008-01-11 15:14
Facundo 1) the +1024 was an accelerator to jump up to over 1k at the first resize. I think it's a good idea or at least doesn't hurt. 2) Here is an example program: def test(): from marshal import dumps from time import time testString = "abc"100000000 print "now testing" now = time() dump = dumps(testString) elapsed = time()-now print "elapsed", elapsed if __name__=="__main__": test() Here are two runs: the first with the old marshal and the second with the patched marshal. The second is better than 2 faster than the first. arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old mtest1.py now testing elapsed 4.13367795944 arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python mtest1.py now testing elapsed 1.7495341301 arw:/home/arw/test> The example that inspired this research was very complicated and involved millions of calls to dumps which caused a number of anomalies (system calls went berzerk for some reason, maybe paging). -- Aaron Watters On Jan 11, 2008 9:25 AM, Facundo Batista <report@bugs.python.org> wrote: > > Facundo Batista added the comment: > > Why not just double the size? The "doubling + 1024" address some > specific issue? If so, it should be commented. > > Also, do you have an example of a marshal.dumps() that suffers from this > issue? > > Thank you! > > ---------- > nosy: +facundobatista > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1792> > __________________________________ >
msg59719 - (view)	Author: Aaron Watters (aaron_watters)	Date: 2008-01-11 16:21
also: I just modified the code to do iterations using increasingly large data sizes and I see the kind of very unpleasant behaviour for the old implementation (max time varies wildly from min time) that I saw in my more complex program. The new implementation doesn't have these problems. First the runs and then the modified code runs arw:/home/arw/test> arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old mtest1.pyold old 0 40 elapsed max= 2.28881835938e-05 min= 4.76837158203e-06 ratio= 4.8 1 160 elapsed max= 1.59740447998e-05 min= 9.05990600586e-06 ratio= 1.76315789474 2 640 elapsed max= 2.40802764893e-05 min= 2.19345092773e-05 ratio= 1.09782608696 3 2560 elapsed max= 8.79764556885e-05 min= 3.981590271e-05 ratio= 2.20958083832 4 10240 elapsed max= 0.000290155410767 min= 0.000148057937622 ratio= 1.95974235105 5 40960 elapsed max= 0.000867128372192 min= 0.00060510635376 ratio= 1.43301812451 6 163840 elapsed max= 0.00739598274231 min= 0.00339317321777 ratio= 2.17966554244 7 655360 elapsed max= 0.0883929729462 min= 0.0139379501343 ratio= 6.34189189189 8 2621440 elapsed max= 1.69851398468 min= 0.0547370910645 ratio= 31.0304028155 9 10485760 elapsed max= 9.98945093155 min= 0.213104963303 ratio= 46.875730986 10 41943040 elapsed max= 132.281101942 min= 0.834150075912 ratio= 158.581897625 arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python mtest1.py new new 0 40 elapsed max= 2.19345092773e-05 min= 5.00679016113e-06 ratio= 4.38095238095 1 160 elapsed max= 1.00135803223e-05 min= 9.05990600586e-06 ratio= 1.10526315789 2 640 elapsed max= 3.19480895996e-05 min= 1.28746032715e-05 ratio= 2.48148148148 3 2560 elapsed max= 5.69820404053e-05 min= 3.981590271e-05 ratio= 1.43113772455 4 10240 elapsed max= 0.000186920166016 min= 0.000138998031616 ratio= 1.34476843911 5 40960 elapsed max= 0.00355315208435 min= 0.000746965408325 ratio= 4.75678263645 6 163840 elapsed max= 0.00326490402222 min= 0.00304794311523 ratio= 1.07118272841 7 655360 elapsed max= 0.0127630233765 min= 0.0122020244598 ratio= 1.04597588855 8 2621440 elapsed max= 0.0511522293091 min= 0.0484230518341 ratio= 1.05636112082 9 10485760 elapsed max= 0.198891878128 min= 0.187420129776 ratio= 1.06120873124 10 41943040 elapsed max= 0.758435964584 min= 0.729014158249 ratio= 1.04035834696 arw:/home/arw/test> Above high ratio numbers indicate strange and unpleasant performance variance. For iteration 7 and higher the old implementation has a much worse max time performance than the new one. Here is the test code: def test(): from marshal import dumps from time import time size = 10 for i in range(11): size = size4 testString = "abc"size #print "now testing", i, size minelapsed = None for j in range(11): now = time() dump = dumps(testString) elapsed = time()-now if minelapsed is None: minelapsed = elapsed maxelapsed = elapsed else: minelapsed = min(elapsed, minelapsed) maxelapsed = max(elapsed, maxelapsed) print i, size, "elapsed max=", maxelapsed, "min=", minelapsed, "ratio=", maxelapsed/minelapsed if __name__=="__main__": import sys print sys.argv[1] test() -- Aaron Watters On Jan 11, 2008 10:14 AM, Aaron Watters < report@bugs.python.org> wrote: > > Aaron Watters added the comment: > > Facundo > > 1) the +1024 was an accelerator to jump up to over 1k at the first resize. > I think it's a good idea or at least doesn't hurt. > > 2) Here is an example program: > > def test(): > from marshal import dumps > from time import time > testString = "abc"100000000 > print "now testing" > now = time() > dump = dumps(testString) > elapsed = time()-now > print "elapsed", elapsed > > if __name__=="__main__": > test() > > Here are two runs: the first with the old marshal and the second with the > patched marshal. The second is > better than 2 faster than the first. > > arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old mtest1.py > now testing > elapsed 4.13367795944 > arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python mtest1.py > now testing > elapsed 1.7495341301 > arw:/home/arw/test> > > The example that inspired this research was very complicated and involved > millions of calls to dumps > which caused a number of anomalies (system calls went berzerk for some > reason, maybe paging). > > -- Aaron Watters > > On Jan 11, 2008 9:25 AM, Facundo Batista <report@bugs.python.org> wrote: > > > > > Facundo Batista added the comment: > > > > Why not just double the size? The "doubling + 1024" address some > > specific issue? If so, it should be commented. > > > > Also, do you have an example of a marshal.dumps() that suffers from this > > issue? > > > > Thank you! > > > > ---------- > > nosy: +facundobatista > > > > __________________________________ > > Tracker <report@bugs.python.org> > > <http://bugs.python.org/issue1792> > > __________________________________ > > > > Added file: http://bugs.python.org/file9124/unnamed > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1792> > __________________________________ >
msg59730 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2008-01-11 17:48
It looks like a reasonable and good patch. Aaron's tests clearly show the speed up. However the maximum resize should be limited to a value around 256 to 1024k: size = min(2size + 1024, 5121024);
msg60179 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2008-01-19 14:48
Yet another easy C task for the bug day.
msg61686 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-01-25 17:58
Christian: you can not do that, as you must increase the size. Anyway, I think that is a good idea to not let it duplicate after some point. Which point? It's arbitrary. You suggested 512KB, but I think that is too low. My arbitrary point is, when the newsize if greater than 32MB, only increaase at 1MB steps. I attach the patch.
msg66560 - (view)	Author: Dustin J. Mitchell (djmitche) *	Date: 2008-05-10 18:52
This test passes -- is this ready to commit? I see a reduction from 1.9s to 1.5s for the test script in msg59715.
msg66629 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2008-05-11 13:35
I've applied Facundo's version of the patch in r63059.
msg68255 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-06-16 01:44
Amended this fix to avoid quadratic behavior altogether. See issue 3116.

History
Date	User	Action	Args
2022-04-11 14:56:29	admin	set	github: 46125
2018-05-19 22:31:14	mbussonn	set	pull_requests: + pull_request6645
2008-06-16 01:44:33	rhettinger	set	nosy: + rhettinger messages: + msg68255
2008-05-11 13:35:09	akuchling	set	status: open -> closed resolution: accepted messages: + msg66629 nosy: + akuchling
2008-05-10 18:52:23	djmitche	set	nosy: + djmitche messages: + msg66560
2008-03-16 21:08:01	georg.brandl	set	type: resource usage -> performance
2008-01-25 17:58:33	facundobatista	set	files: + marshal_limited.diff messages: + msg61686
2008-01-19 14:48:32	christian.heimes	set	keywords: + easy messages: + msg60179
2008-01-11 17:48:21	christian.heimes	set	priority: high keywords: + patch messages: + msg59730 nosy: + christian.heimes
2008-01-11 16:21:46	aaron_watters	set	files: + unnamed messages: + msg59719
2008-01-11 15:14:58	aaron_watters	set	files: + unnamed messages: + msg59715
2008-01-11 14:25:49	facundobatista	set	nosy: + facundobatista messages: + msg59711
2008-01-11 14:21:24	aaron_watters	create