This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pickle 4 frame headers optimization
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, alexandre.vassalotti, larry, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2013-11-25 21:55 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_frame_headers.patch serhiy.storchaka, 2013-11-25 21:55 review
unpickle_file_bench.diff alexandre.vassalotti, 2013-11-27 07:03
unpickle_file_bench_2.diff alexandre.vassalotti, 2013-11-30 04:23
Messages (18)
msg204424 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-25 21:55
Here is a patch which restores (removed at last moment before 3.4beta1 release) frame headers optimization for the pickle protocol 4. Frame header now saved as a part of previous frame. This decreases the number of unbuffered reads per frame from 3 to 1.
msg204447 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-26 01:50
Hmm... I thought your patch was only an optimization, but if you're overlapping frames (by adding 9 to the frame size), it becomes a protocol change.

I personally am against changing the PEP at this point, especially for what looks like a rather minor win.
msg204464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-26 09:43
Frames are not overlapped. And no one opcode straddles frame boundaries. When an implementation supports optional frames, it should support optimized frames as well.

All tests are passed with this optimization except test_optional_frames which hacks pickled data to remove some frame opcodes. This test relied on implementation details.
msg204471 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-26 12:58
Bad wording perhaps, but:

+                    if not final:
+                        n += 9  # next frame header
                     write = self.file_write
                     write(FRAME)
                     write(pack("<Q", n))

does change how the frame length is calculated and emitted in the pickle stream.

This is not compliant with how the PEP defines it (the frame size doesn't include the header of the next frame):
http://www.python.org/dev/peps/pep-3154/#framing

> All tests are passed with this optimization

Well, perhaps there are not enough tests :-) But the protocol is standardized so that other people can implement it. The reference implementation can't do something different than the PEP does.
msg204474 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-26 13:14
If it's an optimizatio, can I see some benchmarks numbers? :-)

I would be interested of benchmarks pickle 3 vs pickle 4.
msg204478 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-26 14:46
> Bad wording perhaps, but:
> 
> +                    if not final:
> +                        n += 9  # next frame header
>                      write = self.file_write
>                      write(FRAME)
>                      write(pack("<Q", n))
> 
> does change how the frame length is calculated and emitted in the pickle
> stream.

Of course (as any optimizer). It produces more optimal pickled data which can 
be parsed by existing unpicklers.

> This is not compliant with how the PEP defines it (the frame size doesn't
> include the header of the next frame):
> http://www.python.org/dev/peps/pep-3154/#framing

"How the pickler decides to partition the pickle stream into frames is an 
implementation detail."

> > All tests are passed with this optimization
> 
> Well, perhaps there are not enough tests :-) But the protocol is
> standardized so that other people can implement it. The reference
> implementation can't do something different than the PEP does.

Could you write tests which exposes behavior difference without sticking 
implementation details?
msg204480 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-26 14:55
> If it's an optimizatio, can I see some benchmarks numbers? :-)

First create two files. Run unpatched Python:

./python -c "import pickle, lzma; data = [bytes([i])*2**16 for i in 
range(256)]; with open('test.pickle4', 'wb'): pickle.dump(data, f, 4)"

Then run the same with patched Python for 'test.pickle4opt'.

Now benchmark loading.

$ ./python -m timeit "import pickle;"  "with open('test.pickle4', 'rb', 
buffering=0) as f: pickle.load(f)"
10 loops, best of 3: 52.9 msec per loop

$ ./python -m timeit "import pickle;"  "with open('test.pickle4opt', 'rb', 
buffering=0) as f: pickle.load(f)"
10 loops, best of 3: 48.9 msec per loop

The difference is about 5%. On faster computers with slower files (sockets?) it 
should be larger.
msg204481 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-26 14:59
> On faster computers with slower files (sockets?) it should be larger.

If your patch avoids unbuffered reads, you can test using these commands before your benchmark:

    sync; echo 3 > /proc/sys/vm/drop_caches

And use one large load() instead of running it in a loop. Or call these commands in the loop.
msg204484 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-26 15:09
> > This is not compliant with how the PEP defines it (the frame size doesn't
> > include the header of the next frame):
> > http://www.python.org/dev/peps/pep-3154/#framing
> 
> "How the pickler decides to partition the pickle stream into frames is an 
> implementation detail."

Of course, but it still has to respect the frame structure shown in the
ASCII art drawing.

> Could you write tests which exposes behavior difference without sticking 
> implementation details?

That's a good idea. At least I'll open an issue for it.
msg204488 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-26 15:20
> If your patch avoids unbuffered reads, you can test using these commands
> before your benchmark:
> 
>     sync; echo 3 > /proc/sys/vm/drop_caches

Thank you. But starting Python needs a lot of I/O. This causes very unstable 
results for this test data.
msg204489 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-26 15:22
> This causes very unstable results for this test data.

So try os.system("sync; echo 3 > /proc/sys/vm/drop_caches") in your timeit benchmark.
msg204490 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-26 15:27
> Of course, but it still has to respect the frame 
structure shown in the
> ASCII art drawing.

I think the ASCII art drawing is just inaccurate. It 
forbids optional framing (tested in 
test_optional_frames).
msg204559 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-11-27 07:03
Serhiy, I am not able to reproduce your results. I don't get any significant performance improvements with your patch.

22:34:03 [ ~/PythonDev/cpython-baseline ]$ ./python.exe -m timeit "import pickle" "with open('test.pickle4', 'rb', buffering=0) as f: pickle.load(f)"
100 loops, best of 3: 9.26 msec per loop

22:36:13 [ ~/PythonDev/cpython ]$ ./python.exe -m timeit "import pickle" "with open('test.pickle4', 'rb', buffering=0) as f: pickle.load(f)"
100 loops, best of 3: 9.28 msec per loop

I wrote a benchmark to simulate slow reads. But again, there is no performance difference with the patch.

22:24:56 [ ~/PythonDev/benchmarks ]$ ./perf.py -v -b unpickle_file ../cpython-baseline/python.exe ../cpython/python.exe
### unpickle_file ###
Min: 1.103715 -> 1.103666: 1.00x faster
Avg: 1.158279 -> 1.146820: 1.01x faster
Not significant
Stddev: 0.04127 -> 0.03273: 1.2608x smaller
msg204599 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-27 15:22
If you want a slow file object, you could benchmark with a GzipFile or similar.
msg204722 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-29 15:00
> 22:36:13 [ ~/PythonDev/cpython ]$ ./python.exe -m timeit "import 
pickle"
> "with open('test.pickle4', 'rb', buffering=0) as f: pickle.load(f)" 100
> loops, best of 3: 9.28 msec per loop

Did you use the same test file or different files (generated by patched 
and unpatched pickler)? Try to run this command several times and take a 
minimum.

> I wrote a benchmark to simulate slow reads. But again, there is no
> performance difference with the patch.

Test data are too small, they all less than frame size.
msg204773 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-11-30 04:23
> Test data are too small, they all less than frame size.

Ah, good point! It seems your patch helps when the reads are *very* slow and buffering is disabled.

### unpickle_file ###
Min: 1.125320 -> 0.663367: 1.70x faster
Avg: 1.237206 -> 0.701303: 1.76x faster
Significant (t=30.77)
Stddev: 0.12031 -> 0.02634: 4.5682x smaller

That certainly is a nice improvement though that seems to be fairly narrow use case. With more normal read times, the improvement diminishes greatly:

### unpickle_file ###
Min: 0.494841 -> 0.466837: 1.06x faster
Avg: 0.540923 -> 0.511165: 1.06x faster
Significant (t=4.10)
Stddev: 0.03740 -> 0.03510: 1.0657x smaller
msg204798 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-30 10:43
While the speedup may be nice, I still don't think this optimization complies with the protocol definition in the PEP, so I would like to reject this patch.
msg204984 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013-12-02 00:52
Yeah, let's close this. It is much simpler to just double the frame size target if the extra reads ever become a performance issue.
History
Date User Action Args
2022-04-11 14:57:54adminsetgithub: 63979
2013-12-02 00:52:35alexandre.vassalottisetstatus: pending -> closed

messages: + msg204984
2013-11-30 10:43:42pitrousetstatus: open -> pending
resolution: rejected
messages: + msg204798

stage: patch review -> resolved
2013-11-30 04:23:18alexandre.vassalottisetfiles: + unpickle_file_bench_2.diff

messages: + msg204773
2013-11-29 15:00:42serhiy.storchakasetmessages: + msg204722
2013-11-27 15:22:50pitrousetmessages: + msg204599
2013-11-27 07:03:10alexandre.vassalottisetfiles: + unpickle_file_bench.diff

messages: + msg204559
2013-11-26 15:27:02serhiy.storchakasetmessages: + msg204490
2013-11-26 15:22:56vstinnersetmessages: + msg204489
2013-11-26 15:20:49serhiy.storchakasetmessages: + msg204488
2013-11-26 15:09:01pitrousetmessages: + msg204484
2013-11-26 14:59:43vstinnersetmessages: + msg204481
2013-11-26 14:55:48serhiy.storchakasetmessages: + msg204480
2013-11-26 14:46:09serhiy.storchakasetmessages: + msg204478
2013-11-26 13:14:41vstinnersetnosy: + vstinner
messages: + msg204474
2013-11-26 12:58:30pitrousetmessages: + msg204471
2013-11-26 09:43:01serhiy.storchakasetmessages: + msg204464
2013-11-26 01:50:51pitrousetmessages: + msg204447
2013-11-25 22:25:08Arfreversetnosy: + Arfrever
2013-11-25 21:55:08serhiy.storchakacreate