Title: Bump the default pickle protocol in shelve
Components: Library (Lib) Versions: Python 3.10
Assigned To: Nosy List: ZackerySpytz, alexandre.vassalotti, lukasz.langa, marco-c, rhettinger, serhiy.storchaka, vstinner
Created on 2018-07-24 09:16 by serhiy.storchaka, last changed 2022-04-11 14:59 by admin. This issue is now closed.

PR 19639 merged ZackerySpytz, 2020-04-21 23:20
PR 22751 closed marco-c, 2020-10-27 23:51
Messages (5)
msg322281 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-24 09:16
The default pickle protocol is 4 now. But shelve still uses the pickle protocol 3. Shouldn't it be bumped? Shouldn't shelve use pickle.DEFAULT_PROTOCOL by default?


1. This will make shelve files incompatible with Python 3.3 by default.

2. Protocol 4 adds 9 bytes of overhead in comparison with protocol 3. This can be too large for the shelve containing a lot of small objects. Maybe strip redundant frame header for small pickles?
msg370137 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-27 21:29
I wrote a short script to see the impact of file size depending on the protocol:
import shelve
import os.path

print("== Short value ==")
for proto in (0, 1, 2, 3, 4, 5):
    filename = 'shelve-picklev%s' % proto
    with, protocol=proto) as db:
        assert db._protocol == proto
        for x in range(1000):
            db[str(x)] = str(x)
    print(f'Protocol {proto}: {os.path.getsize(filename)} bytes')
print("== Large value ==")
large_value = [str(x) for x in range(1000)]
for proto in (0, 1, 2, 3, 4, 5):
    filename = 'shelve-picklev%s' % proto
    with, protocol=proto) as db:
        assert db._protocol == proto
        for x in range(10):
            db[str(x)] = large_value
    print(f'Protocol {proto}: {os.path.getsize(filename)} bytes')

Output with Python 3.9.0b1 (on Fedora 32):
== Short value ==
Protocol 0: 90112 bytes
Protocol 1: 94208 bytes
Protocol 2: 94208 bytes
Protocol 3: 94208 bytes
Protocol 4: 94208 bytes
Protocol 5: 94208 bytes

== Large value ==
Protocol 0: 139264 bytes
Protocol 1: 139264 bytes
Protocol 2: 139264 bytes
Protocol 3: 139264 bytes
Protocol 4: 98304 bytes
Protocol 5: 98304 bytes

For short string values, protocol 0 produces smaller files than protocol 1 and higher.

For large value, protocol 4 and higher produce smaller files than protocol 3 and lower.
msg379812 - (view) Author: Marco Castelluccio (marco-c) * Date: 2020-10-28 00:17
I've opened to fix this, I know there was already a PR, but it seems to have been abandoned.
msg379813 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2020-10-28 02:28
It has not been abandoned.
msg379861 - (view) Author: miss-islington (miss-islington) Date: 2020-10-29 09:45
New changeset df59273c7a384ea8c890fa8e9b80c92825df841c by Zackery Spytz in branch 'master':
bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (GH-19639)
