New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Unladen Swallow's optimizations to Python 3's pickle. #53656
Comments
This is a big patch. Please review at http://codereview.appspot.com/1694050/show This patch adds the most interesting optimizations from Unladen Swallow to Python 3's pickle. The core of the patch already been reviewed by Antoine and me (http://codereview.appspot.com/33070/show). One of the last issue remaining the unbounded size of the internal buffer. This shouldn't be a big issue for most uses of pickle, since the size of a pickle is often several times smaller than the object hierarchy that created it. I still hope to fix this in a following patch. The patch also include additional cleanups to the Pdata structure. The changes felt natural to make along with the other changes from Unladen Swallow. IIRC, these changes yield an additional 1-3% speedup. |
Le jeudi 29 juillet 2010 à 06:26 +0000, Alexandre Vassalotti a écrit :
I still think this should be fixed in this patch, especially since it |
For those not familiar with Unladen Swallow, can you describe what the "most interesting optimizations" are? Maybe there is an Unladen Swallow document you can point to. Would any of these optimizations apply to python implementation? |
I'm working on bpo-3873 to add a read buffer (fixed size, 4096 bytes) to the unpickler. It's 6 to 8 times faster with the read buffer: but this patch is mainly to avoid the overhead introduced by the new I/O library (in Python2, unpickler was faster because it doesn't need to call Python functions to read some bytes). Is this feature included in this big patch? |
Look around the issues. I'm pretty sure I worked on the Skip |
Ah, that's right Skip. You did fixed it in Unladen Swallow's trunk. I will take a look at your solution. |
The patch doesn't apply cleanly anymore. Furthermore, I discovered some additional issues:
I'm working on an updated patch, fixing the aforementioned bugs and adding a buffer size limit. |
Antoine, I fixed these issues in the latest patch posted on Rietveld. Also, Skip added the buffer limit in Unladen Swallow (see msg112956). We just need to merge that. |
Here is a patch. Benchmark numbers:
./python -m timeit -s "import pickle, io; d={(x, 'a'): x for x in range(10000)}" "pickle.dumps(d)" -> before: 100 loops, best of 3: 7.47 msec per loop
./python -m timeit -s "import pickle, io; d={(x, 'a'): x for x in range(10000)}; d=pickle.dumps(d)" "pickle.loads(d)" -> before: 100 loops, best of 3: 12.1 msec per loop
./python -m timeit -s "import pickle, io; d={(x, 'a'): x for x in range(10000)}" "pickle.dump(d, io.BytesIO())"
./python -m timeit -s "import pickle, io; d={(x, 'a'): x for x in range(10000)}; d=pickle.dumps(d)" "pickle.load(io.BytesIO(d))" As you can see, load() doesn't really benefit from the buffering improvements. The three methods see quite massive speedups. |
Gosh. My patch is based on an outdated patch :( |
Ok, this patch merges my changes with Alexandre's previous patch. Performance is similar as the previous posted patch. |
For the record, here are the unladen swallow benchmark results against stock py3k: ### pickle ### ### unpickle ### ### pickle_dict ### ### pickle_list ### ### unpickle_list ### |
The patch was finally committed in r84653. Thanks to everyone who participated in this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: