This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Avoid creating small frames in pickle protocol 4
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Olivier.Grisel, alexandre.vassalotti, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2018-01-06 18:36 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5127 merged serhiy.storchaka, 2018-01-07 07:35
Messages (5)
msg309564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-06 18:36
Pickle protocol 4 uses framing for reducing the overhead of calling the read() method for small chunks of data. Most read chunks are small -- opcodes, small integers, short strings, etc, and calling read() for every 1 or 4 bytes is too expensive. But using framing itself adds an overhead. It increases the size of pickled data by 9 bytes. A frame  itself needs 3 reads -- the opcode, the frame size, and a payload. Thus it doesn't make sense to create a frame containing less than 3 chunks of data.

For example after issue31993 pickling the list [b'a'*70000, b'b'*70000] with the Python implementation produces a data containing 3 frames of sizes 3, 1 and 3. Using frames here is completely redundant.
msg309610 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-07 07:42
PR 5127 makes frames be created only when the size of the payload is not less than 4. Since the minimal size of 3 chunks is 3 bytes this is the absolute minimum of frame size.

It would be better to count the number of chunks instead of bytes, but this will complicate implementations, especially Python implementation.
msg309612 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-07 10:41
I don't think the overall gain is meaningful.  I'd rather not add too many special cases in the framing code.
msg310087 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-16 13:14
In the current form the change is trivial, just an additional check. Actually it fixes a regression introduced in issue31993. Currently even empty frames can be produced (when fast=True).
msg310349 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-20 14:42
New changeset 1211c9a9897a174b7261ca258cabf289815a40d8 by Serhiy Storchaka in branch 'master':
bpo-32503: Avoid creating too small frames in pickles. (#5127)
https://github.com/python/cpython/commit/1211c9a9897a174b7261ca258cabf289815a40d8
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76684
2018-01-20 14:53:49serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-01-20 14:42:46serhiy.storchakasetmessages: + msg310349
2018-01-16 13:14:22serhiy.storchakasetmessages: + msg310087
2018-01-07 10:41:18pitrousetmessages: + msg309612
2018-01-07 07:42:29serhiy.storchakasetmessages: + msg309610
2018-01-07 07:35:09serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4988
2018-01-06 18:36:38serhiy.storchakacreate