classification
Title: Bump the default pickle protocol in shelve
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, alexandre.vassalotti, lukasz.langa, marco-c, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2018-07-24 09:16 by serhiy.storchaka, last changed 2020-10-29 09:46 by cheryl.sabella. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19639 merged ZackerySpytz, 2020-04-21 23:20
PR 22751 closed marco-c, 2020-10-27 23:51
Messages (5)
msg322281 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-24 09:16
The default pickle protocol is 4 now. But shelve still uses the pickle protocol 3. Shouldn't it be bumped? Shouldn't shelve use pickle.DEFAULT_PROTOCOL by default?

Disadvantages:

1. This will make shelve files incompatible with Python 3.3 by default.

2. Protocol 4 adds 9 bytes of overhead in comparison with protocol 3. This can be too large for the shelve containing a lot of small objects. Maybe strip redundant frame header for small pickles?
msg370137 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-27 21:29
I wrote a short script to see the impact of file size depending on the protocol:
---
import shelve
import os.path

print("== Short value ==")
for proto in (0, 1, 2, 3, 4, 5):
    filename = 'shelve-picklev%s' % proto
    with shelve.open(filename, protocol=proto) as db:
        assert db._protocol == proto
        for x in range(1000):
            db[str(x)] = str(x)
    print(f'Protocol {proto}: {os.path.getsize(filename)} bytes')
    os.unlink(filename)
print()
print("== Large value ==")
large_value = [str(x) for x in range(1000)]
for proto in (0, 1, 2, 3, 4, 5):
    filename = 'shelve-picklev%s' % proto
    with shelve.open(filename, protocol=proto) as db:
        assert db._protocol == proto
        for x in range(10):
            db[str(x)] = large_value
    print(f'Protocol {proto}: {os.path.getsize(filename)} bytes')
    os.unlink(filename)
---

Output with Python 3.9.0b1 (on Fedora 32):
---
== Short value ==
Protocol 0: 90112 bytes
Protocol 1: 94208 bytes
Protocol 2: 94208 bytes
Protocol 3: 94208 bytes
Protocol 4: 94208 bytes
Protocol 5: 94208 bytes

== Large value ==
Protocol 0: 139264 bytes
Protocol 1: 139264 bytes
Protocol 2: 139264 bytes
Protocol 3: 139264 bytes
Protocol 4: 98304 bytes
Protocol 5: 98304 bytes
---

For short string values, protocol 0 produces smaller files than protocol 1 and higher.

For large value, protocol 4 and higher produce smaller files than protocol 3 and lower.
msg379812 - (view) Author: Marco Castelluccio (marco-c) * Date: 2020-10-28 00:17
I've opened https://github.com/python/cpython/pull/22751 to fix this, I know there was already a PR, but it seems to have been abandoned.
msg379813 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2020-10-28 02:28
It has not been abandoned.
msg379861 - (view) Author: miss-islington (miss-islington) Date: 2020-10-29 09:45
New changeset df59273c7a384ea8c890fa8e9b80c92825df841c by Zackery Spytz in branch 'master':
bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (GH-19639)
https://github.com/python/cpython/commit/df59273c7a384ea8c890fa8e9b80c92825df841c
History
Date User Action Args
2020-10-29 09:46:33cheryl.sabellasetstatus: open -> closed
nosy: - miss-islington

resolution: fixed
stage: patch review -> resolved
2020-10-29 09:45:09miss-islingtonsetnosy: + miss-islington
messages: + msg379861
2020-10-28 02:28:17ZackerySpytzsetmessages: + msg379813
versions: + Python 3.10, - Python 3.8
2020-10-28 00:17:30marco-csetmessages: + msg379812
2020-10-27 23:51:34marco-csetnosy: + marco-c
pull_requests: + pull_request21928
2020-10-18 19:08:59ZackerySpytzlinkissue42071 superseder
2020-05-27 21:29:04vstinnersetnosy: + vstinner
messages: + msg370137
2020-04-21 23:20:38ZackerySpytzsetkeywords: + patch
nosy: + ZackerySpytz

pull_requests: + pull_request18963
stage: patch review
2018-07-24 15:09:55pitrousetnosy: + rhettinger
2018-07-24 09:16:42serhiy.storchakacreate