New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PEP 3154 (pickle protocol 4) #62010
Comments
I have restarted the work on PEP-3154. Stefan Mihaila had begun an implementation as part of the Google Summer of Code 2012. Unfortunately, he hit multiple roadblocks which prevented him to finish his work by the end of the summer. He previously shown interest in completing his implementation. However he got constrained by time and never resumed his work. So I am taking over the implementation of the PEP. I have decided to go forward with a brand new code, using Stefan's work only as a guide. At the moment, I have completed about half of the PEP---missing only support for calling __new__ with keyword arguments and the use of new qualified name for referring objects. Design-wise, there is still a few things that we should discuss. For example, I think Stefan's idea, which is not specified in the PEP, to eliminate PUT opcodes is interesting. His proposal was to emit an implicit PUT opcode after each object pickled and make the Pickler and Unpickler classes agree on the scheme. A drawback of this implicit scheme is we cannot be selective about which object we save in the memo during unpickling. That means, for example, we won't be able to make pickletools.optimize work with protocol 4 to reduce the memory footprint of the unpickling process. This scheme also alters the meaning of all previously defined opcodes because of the implicit PUTs, which is sort of okay because we are changing protocol. Alternatively, we could use an explicit scheme by defining new "fat" opcodes, for the built-in types we care about, which includes memoization. This scheme would a bit more flexible however it would also be slightly more involved implementation-wise. In any case, I will run benchmarks to see if either schemes are worthwhile. |
Thank you for reviving this :)
Are the savings worth it?
|
Link to the previous attempt: bpo-15642. |
Memoization consumes memory during pickling. For now every memoized object requires memory for: dict's entity; It's about 80 bytes on 32-bit platform (and twice as this on 64-bit). For data which contains a lot of floats it can be cumbersome. |
As far as I understand, Alexandre doesn't propose to suppress
Apparently, floats don't get memoized: >>> pickletools.dis(pickle.dumps([1.0, 2.0]))
0: \x80 PROTO 3
2: ] EMPTY_LIST
3: q BINPUT 0
5: ( MARK
6: G BINFLOAT 1.0
15: G BINFLOAT 2.0
24: e APPENDS (MARK at 5)
25: . STOP |
I would like to see Proto4 include an option for compression (zlib,bz2) or somesuch and become self-decompressing upon unpickling. The primary use cases for pickling involve writing to disk or transmitting across a wire -- both use cases benefit from compression (with reduced read/write times). |
I don't see what this would bring over explicit compression:
|
I agree with Charles-François. |
A proof of concept hack to enable framing on pickle showed a massive performance increase on streaming unpickling (up to 5x faster with a C file object such as io.BytesIO, up to 150x faster with a pure Python file object such as _pyio.BytesIO). There is a slight slowdown on non-streaming operation, but that could probably be optimized. |
(note: I've updated PEP-3154 with framing and GLOBAL_STACK) |
What if just use io.BufferedReader? if not isinstance(file, io.BufferedReader):
file = io.BufferedReader(file) (at start of _Unpickler.__init__) |
Two problems:
|
I were thinking about framing before looking at your last changes to PEP-3154 and I have two alternative propositions.
|
Padding makes it both less efficient and more annoying to handle, IMO.
No doublecopying is necessary (not in the C version, that is). That |
Agree. But there is other application for NOPs. UTF-8 decoder (and some other decoders) works more fast (up to 4x) when input is aligned. By adding several NOPs before BINUNICODE so that start of encoded data is 4- or 8-bytes aligned relatively to start of frame, we can significan speedup unpickling long ASCII strings. I propose to add new NOP opcode and to use it to align some align-sensitive data.
I just looked in the code and saw that the unpickler already has a ready infrastructure for prefetching. Now your words have not appear to be so incredible. ;) It should work.
Agree, there is no doublecopying (except for large bytes objects). |
Here is a framing patch on top of Alexandre's work. There is one thing that framing breaks: pickletools.optimize(). I think it would be non-trivial to fix it. Perhaps the PREFETCH opcode is a better idea for this. Alexandre, I don't understand why you removed STACK_GLOBAL. GLOBAL is a PITA that we should not use in protocol 4 anymore, so we need either STACK_GLOBAL or some kind of BINGLOBAL. |
What is wrong with GLOBAL? |
It uses the lame "text mode" that scans for newlines, and is generally |
With framing it isn't annoying. |
Antoine, I removed STACK_GLOBAL when I found performance issues with the implementation. The changeset that added it had some unrelated changes that made it harder to debug than necessary. I am planning to re-add it when I worked out the kinks. |
Slightly less, but you still have to wrap readline() calls in the I have started experimenting with PREFETCH, but making the opcode |
And here is an implementation of PREFETCH over Alexandre's work. |
Here is an updated framing patch which supports pickletools.optimize(). |
The latest framing patch looks pretty nice overall. One concern is we need to make sure the C implementation call _Pickler_OpcodeBoundary often enough to keep the frames around the sizes. For example, batch_save_list and batch_save_dict can currently create a frame much larger than expected. Interestingly enough, I found pickle, with patch applied, crashes when handling such frames: 13:44:43 pep-3154 $ ./python -c "import pickle, io; pickle.dump(list(range(10**5)), io.BytesIO(), 4)" Current thread 0x00007f5dea491700: Also, I think we should try to make pickletools.dis display the frame boundaries to help with debugging. This could be implemented by adding an option to pickletools.genops which could be helpful for testing the framing implementation as well. |
I don't understand how that can happen. batch_list() and batch_dict()
Interesting, I'll take a look when I have some time.
Agreed. |
Ah, you're right. I was thinking in terms of my fast dispatch patch in issue bpo-17787. Sorry for the confusion! |
I am currently fleshing out an improved implementation for the reduce protocol version 4. One thing I am curious about is whether we should keep the special cases we currently have there for dict and list subclasses. I recall Raymond expressed disagreement in #msg83098 about this behavior. I agree that having __setitem__ called before __init__ make it harder for dict and list subclasses to support pickling. To take advantage of the special case, subclasses need to do their required initialization in the __new__ method. On the other hand, it does decrease the memory requirements for unpickling such subclasses---i.e., we can build the object in-place instead of building an intermediary list or dict. Reading PEP-307 confirms indeed that was the original intention. One possible solution, other than removing the special case completely, is to make sure we initialize the object (using the BUILD opcode) before we call __setitem__ or append on it. This would be a simple change that would solve the initialization issue. However, I would still feel uneasy about the default object.__reduce__ behavior depending on the object's subtype. I think it could be worthwhile to investigate a generic API for pickling collections in-place. For example, a such API would helpful for pickling set subclasses in-place. __items__() or Return an iterator of the items in the collection. Would be __additems__(items) Add a batch of items to the collection. By default, it would for item in items:
self.__additem__(item)
__additem__(item) Add a single item to the collection. Would be equivalent to The collections module's ABCs could then provide default implementations of this API, which would give its users efficient in-place pickling automatically. |
Thanks Stefan for the patch. It's very much appreciated. I will try to review it soon. Of the features you proposed, the twos I would like to take a look again is implicit memoization and the BAIL_OUT opcode. For the implicit memoization feature, we will need to have some performance results in hand to justify the major changes it needs. If you can you work out a quick patch, I can run it through the benchmarks suite for pickle and measure the impact. Hopefully, we will see a good improvement though we can't be sure until we measure. And as for the BAIL_OUT opcode, it would be interesting to revisit its use now that we support binary framing. It could be helpful to add it to prevent the Unpickler from hanging if the other end forgot to close the stream. I am still not totally convinced. However if you make a good case for it, I would support to see it included. |
Stefan, I took a quick look at your patch. There is a couple things that stands out. First, I think the implementation of BINGLOBAL and BINGLOBAL_BIG should be moved to another patch. Adding a binary version opcode for GLOBAL is a separate feature and it should be reviewed independently. Personally, I prefer the STACK_GLOBAL opcode I proposed as it much simpler to implement, but I am biased. Next, the patch's formatting should be fixed to conform to PEP-7 and PEP-8. Make sure the formatting is consistent with the surrounding code. In particular, comments should be full sentences that explains why we need this code. Avoid adding comments that merely say what the code does, unless the code is complex. In addition, please replace the uses of PyUnicode_InternFromString with the _Py_IDENTIFIER as needed. The latter allow the static strings to be garbage collected when the module is deleted, which is friendlier to embedded interpreters. It is also lead to cleaner code. Finally, the class method check hack looks like a bug to me. There are multiple solutions here. For example, we could fix class methods to be cached so they always have the same ID once they are created. Or, we could remove the 'is' check completely if it is unnecessary. |
It would be nice if you could reconcile each other's work. Especially so
Well, it's part of the PEP.
I agree it sounds simpler. I hadn't thought about it when first writing |
Stefan, could you address my review comments soon? The improved support for globals is the only big piece missing from the implementation of PEP, which I would like to get done and submitted by the end of the month. |
On 6/3/2013 9:33 PM, Alexandre Vassalotti wrote:
Yes, I apologize for the delay again. Today is my last exam this |
Alexandre, Stefan, is any of you working on this? Thanks! |
Potentially relevant to this: we hope to have PEP-451 done for 3.4, which adds a __spec__ attribute to module objects, and will also tweak runpy to ensure -m registers __main__ under it's real name as well. If pickle uses __spec__.name in preference to __name__ when __spec__ is defined, then objects defined in __main__ modules run via -m should start being pickled correctly. |
I am still working on it. I am implemented support for nested globals last week (http://hg.python.org/features/pep-3154-alexandre/rev/c8991b32a47e). At this point, the only big piece remaining is the support for method descriptors. There are other minor things left but we can worry about those later. Nick, thanks for the pointer! I didn't know about PEP-451. I will look how we can use it in pickle. |
Hi folks, I consider my implementation of PEP-3154 mostly feature complete at this point. I still have a few things left to do. For example, I need to update the documentation about the new protocol. However, these can mostly be done along the review process. Plus, I definitely prefer getting feedback sooner. :-) Please review at: http://bugs.python.org/review/17810/ Thanks! |
I have been looking again at Stefan's previous proposal of making memoization implicit in the new pickle protocol. While I liked the smaller pickles it produced, I didn't the invasiveness of the implementation, which requires a change for almost every opcode processed by the Unpickler. This led me to, what I think is, a reasonable compromise between what we have right now and Stefan's proposal. That is we can make the argument of the PUT opcodes implicit, without making the whole opcode implicit. I've implemented this by introducing a new opcode MEMOIZE, which stores the top of the pickle stack using the size of the memo as the index. Using the memo size as the index avoids us some extra bookkeeping variables and handles nicely situations where Pickler.memo.clear() or Unpickler.memo.clear() are used. Size-wise, this brings some good improvements for pickles containing a lot of dicts and lists. # Before # After with new MEMOIZE opcode Time-wise, the change is mostly neutral. It makes pickling dicts and lists slightly faster because it simplifies the code for memo_put() in _pickle. Report on Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 i386 ### pickle4_dict ### ### pickle4_list ### Anyhow, I have committed this improvement in my pep-3154 branch (http://hg.python.org/features/pep-3154-alexandre/rev/8a2861aaef82) for now, though I will happily revert it if people oppose to the change. |
I propose to include frame size in previous frame. This will twice decrease the number of file reads. |
Attached is a patch that takes a different approach to framing, putting it into an optional framing layer by means of a buffered reader/writer. The framing structure is the same as in PEP-3154; a separate PYFRAMES magic is prepended to guard against protocol inconsistencies and to allow for automatic detection of framing. |
New changeset 992ef855b3ed by Antoine Pitrou in branch 'default': |
I've now committed Alexandre's latest work (including the FRAME and MEMOIZE opcodes). |
New changeset d719975f4d25 by Christian Heimes in branch 'default': |
New changeset c54becd69805 by Christian Heimes in branch 'default': |
New changeset a02adfb3260a by Christian Heimes in branch 'default': |
New changeset 3e16c8c34e69 by Christian Heimes in branch 'default': |
I've finalized the framing implementation in de9bda43d552. There will be more improvements to come until 3.4 final. However, feature-wise we are done. Thank you everyone for the help! |
[Alexandre Vassalotti]
Woo hoo! Thank YOU for the hard work - I know how much fun this is ;-) |
Here is a patch which restores optimization for frame headers. Unfortunately it breaks test_optional_frames. |
Isn't it a little late to be changing the pickle protocol, now that we've hit feature-freeze? If you want to check something like this in you're going to have to make a good case for it. |
This doesn't change the pickle protocol. This is just an implementation detail. |
Optimizing the output of the pickler class should be fine during the feature freeze as long the semantics of the current opcodes stay unchanged. |
Well, Larry may expand, but I think we don't commit performance optimizations during the feature freeze either. |
I'll make you a deal. As long as the protocol remains 100% backwards and forwards compatible (3.4.0b1 can read anything written by trunk, and trunk can read anything written by 3.4.0b1), you can make optimizations until beta 2. After that you have to stop... or get permission again. |
I have opened separate bpo-19780 for this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: