I'm porting some code from Python 2.7 to Python 3.8. There is some code that is using shelve.DbfilenameShelf to store some nested dictionaries with sets. I found out that compared with Python 2.7 Python 3.8 shelve generates files that are approximately 164 larger on disk. Python 3.8 file is 2 027 520 size, when Python 2.7 size is 12 288.
Code sample:
Filename: test_anydbm.py
#!/usr/bin/env python
import datetime
import shelve
import sys
import time
from os import path
def main():
print(sys.version)
fname = 'shelf_test_{}'.format(datetime.datetime.now().isoformat())
bucket = shelve.DbfilenameShelf(fname, "n")
now = time.time()
limit = 1000
key = 'some key > some key > other'
top_dict = {}
to_store = {
1: {
'page_item_numbers': set(),
'products_on_page': None
}
}
for i in range(limit):
to_store[1]['page_item_numbers'].add(i)
top_dict[key] = to_store
bucket[key] = top_dict
end = time.time()
db_file = False
try:
fsize = path.getsize(fname)
except Exception as e:
print("file not found? {}".format(e))
try:
fsize = path.getsize(fname + '.db')
db_file = True
except Exception as e:
print("file not found? {}".format(e))
fsize = None
print("Stored {} in {} filesize {}".format(limit, end - now, fsize))
print(fname)
bucket.close()
bucket = shelve.DbfilenameShelf(fname, flag="r")
if db_file:
fname += '.db'
print("In file {} {}".format(fname, len(list(bucket.items()))))
Output of running it in docker image:
Dockerfile:
FROM python:2-jessie
VOLUME /scripts
CMD scripts/test_anydbm.py
2.7.16 (default, Jul 10 2019, 03:39:20)
[GCC 4.9.2]
Stored 1000 in 0.0814290046692 filesize 12288
shelf_test_2020-07-08T07:26:23.778769
In file shelf_test_2020-07-08T07:26:23.778769 1
So you can see file size: 12 288
And now running same thing in Python 3
Dockerfile:
FROM python:3.8-slim-buster
VOLUME /scripts
CMD scripts/test_anydbm.py
3.8.3 (default, Jun 9 2020, 17:49:41)
[GCC 8.3.0]
Stored 1000 in 0.02681446075439453 filesize 2027520
shelf_test_2020-07-08T07:27:18.068638
In file shelf_test_2020-07-08T07:27:18.068638 1
Notice file size: 2 027 520
Why is this happening? Is this a bug? If I'd like to fix it, do you have some ideas about causes of this?
|