This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Paweł Miech
Recipients Paweł Miech
Date 2020-07-08.07:32:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1594193566.99.0.171960738347.issue41238@roundup.psfhosted.org>
In-reply-to
Content
I'm porting some code from Python 2.7 to Python 3.8. There is some code that is using shelve.DbfilenameShelf to store some nested dictionaries with sets. I found out that compared with Python 2.7 Python 3.8 shelve generates files that are approximately 164 larger on disk. Python 3.8 file is 2 027 520 size, when Python 2.7 size is 12 288.

Code sample:
Filename: test_anydbm.py

#!/usr/bin/env python
import datetime
import shelve
import sys
import time
from os import path


def main():
    print(sys.version)
    fname = 'shelf_test_{}'.format(datetime.datetime.now().isoformat())
    bucket = shelve.DbfilenameShelf(fname, "n")
    now = time.time()
    limit = 1000
    key = 'some key > some key > other'
    top_dict = {}
    to_store = {
        1: {
            'page_item_numbers': set(),
            'products_on_page': None
        }
    }
    for i in range(limit):
        to_store[1]['page_item_numbers'].add(i)
        top_dict[key] = to_store
        bucket[key] = top_dict
    end = time.time()
    db_file = False
    try:
        fsize = path.getsize(fname)
    except Exception as e:
        print("file not found? {}".format(e))
        try:
            fsize = path.getsize(fname + '.db')
            db_file = True
        except Exception as e:
            print("file not found? {}".format(e))
            fsize = None
    print("Stored {} in {} filesize {}".format(limit, end - now, fsize))
    print(fname)
    bucket.close()
    bucket = shelve.DbfilenameShelf(fname, flag="r")
    if db_file:
        fname += '.db'
    print("In file {} {}".format(fname, len(list(bucket.items()))))

Output of running it in docker image:

Dockerfile:
FROM python:2-jessie
VOLUME /scripts
CMD scripts/test_anydbm.py

2.7.16 (default, Jul 10 2019, 03:39:20) 
[GCC 4.9.2]
Stored 1000 in 0.0814290046692 filesize 12288
shelf_test_2020-07-08T07:26:23.778769
In file shelf_test_2020-07-08T07:26:23.778769 1


So you can see file size: 12 288

And now running same thing in Python 3

Dockerfile:

FROM python:3.8-slim-buster
VOLUME /scripts
CMD scripts/test_anydbm.py

3.8.3 (default, Jun  9 2020, 17:49:41) 
[GCC 8.3.0]
Stored 1000 in 0.02681446075439453 filesize 2027520
shelf_test_2020-07-08T07:27:18.068638
In file shelf_test_2020-07-08T07:27:18.068638 1

Notice file size: 2 027 520

Why is this happening? Is this a bug? If I'd like to fix it, do you have some ideas about causes of this?
History
Date User Action Args
2020-07-08 07:32:47Paweł Miechsetrecipients: + Paweł Miech
2020-07-08 07:32:46Paweł Miechsetmessageid: <1594193566.99.0.171960738347.issue41238@roundup.psfhosted.org>
2020-07-08 07:32:46Paweł Miechlinkissue41238 messages
2020-07-08 07:32:46Paweł Miechcreate