Message 203339 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	alexandre.vassalotti
Recipients	Arfrever, alexandre.vassalotti, asvetlov, mstefanro, ncoghlan, neologix, pitrou, rhettinger, serhiy.storchaka
Date	2013-11-19.07:16:37
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1384845398.45.0.941599669229.issue17810@psf.upfronthosting.co.za>
In-reply-to

Content
I have been looking again at Stefan's previous proposal of making memoization implicit in the new pickle protocol. While I liked the smaller pickles it produced, I didn't the invasiveness of the implementation, which requires a change for almost every opcode processed by the Unpickler. This led me to, what I think is, a reasonable compromise between what we have right now and Stefan's proposal. That is we can make the argument of the PUT opcodes implicit, without making the whole opcode implicit. I've implemented this by introducing a new opcode MEMOIZE, which stores the top of the pickle stack using the size of the memo as the index. Using the memo size as the index avoids us some extra bookkeeping variables and handles nicely situations where Pickler.memo.clear() or Unpickler.memo.clear() are used. Size-wise, this brings some good improvements for pickles containing a lot of dicts and lists. # Before $ ./python.exe -c "import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4)))" 5251 # After with new MEMOIZE opcode ./python.exe -c "import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4)))" 2015 Time-wise, the change is mostly neutral. It makes pickling dicts and lists slightly faster because it simplifies the code for memo_put() in _pickle. Report on Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 i386 Total CPU cores: 4 ### pickle4_dict ### Min: 0.714912 -> 0.667203: 1.07x faster Avg: 0.741616 -> 0.685567: 1.08x faster Significant (t=16.25) Stddev: 0.02033 -> 0.01346: 1.5102x smaller Timeline: http://goo.gl/iHqCfB ### pickle4_list ### Min: 0.414151 -> 0.398913: 1.04x faster Avg: 0.432094 -> 0.409058: 1.06x faster Significant (t=11.83) Stddev: 0.01049 -> 0.00893: 1.1749x smaller Timeline: http://goo.gl/wfQzgL Anyhow, I have committed this improvement in my pep-3154 branch (http://hg.python.org/features/pep-3154-alexandre/rev/8a2861aaef82) for now, though I will happily revert it if people oppose to the change.

I have been looking again at Stefan's previous proposal of making memoization implicit in the new pickle protocol. While I liked the smaller pickles it produced, I didn't the invasiveness of the implementation, which requires a change for almost every opcode processed by the Unpickler. This led me to, what I think is, a reasonable compromise between what we have right now and Stefan's proposal. That is we can make the argument of the PUT opcodes implicit, without making the whole opcode implicit.

I've implemented this by introducing a new opcode MEMOIZE, which stores the top of the pickle stack using the size of the memo as the index. Using the memo size as the index avoids us some extra bookkeeping variables and handles nicely situations where Pickler.memo.clear() or Unpickler.memo.clear() are used.

Size-wise, this brings some good improvements for pickles containing a lot of dicts and lists.

# Before
$ ./python.exe -c "import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4)))"
5251

# After with new MEMOIZE opcode
./python.exe -c "import pickle; print(len(pickle.dumps([[] for _ in range(1000)], 4)))"
2015

Time-wise, the change is mostly neutral. It makes pickling dicts and lists slightly faster because it simplifies the code for memo_put() in _pickle.

Report on Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 i386
Total CPU cores: 4

### pickle4_dict ###
Min: 0.714912 -> 0.667203: 1.07x faster
Avg: 0.741616 -> 0.685567: 1.08x faster
Significant (t=16.25)
Stddev: 0.02033 -> 0.01346: 1.5102x smaller
Timeline: http://goo.gl/iHqCfB

### pickle4_list ###
Min: 0.414151 -> 0.398913: 1.04x faster
Avg: 0.432094 -> 0.409058: 1.06x faster
Significant (t=11.83)
Stddev: 0.01049 -> 0.00893: 1.1749x smaller
Timeline: http://goo.gl/wfQzgL

Anyhow, I have committed this improvement in my pep-3154 branch (http://hg.python.org/features/pep-3154-alexandre/rev/8a2861aaef82) for now, though I will happily revert it if people oppose to the change.

History
Date	User	Action	Args
2013-11-19 07:16:38	alexandre.vassalotti	set	recipients: + alexandre.vassalotti, rhettinger, ncoghlan, pitrou, Arfrever, asvetlov, neologix, serhiy.storchaka, mstefanro
2013-11-19 07:16:38	alexandre.vassalotti	set	messageid: <1384845398.45.0.941599669229.issue17810@psf.upfronthosting.co.za>
2013-11-19 07:16:38	alexandre.vassalotti	link	issue17810 messages
2013-11-19 07:16:37	alexandre.vassalotti	create