This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Database corruption with the shelve module
Type: behavior Stage:
Components: Demos and Tools Versions: Python 3.10, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: HubTou, koobs, lemburg, terry.reedy
Priority: normal Keywords:

Created on 2022-03-20 17:29 by HubTou, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
shelve-test.zip HubTou, 2022-03-20 17:29 Small test program to reproduce the bug
shelve-test-3.10.zip HubTou, 2022-03-26 07:09 Small test program and results
shelve-test-3.10-b.zip HubTou, 2022-03-26 13:38 Small test program and results, with better record size
Messages (7)
msg415625 - (view) Author: Hubert Tournier (HubTou) * Date: 2022-03-20 17:29
After adding a few records, the shelve module corrupts the database keys (the database is still readable if an element key is known, but no more iterable):

Traceback (most recent call last):
  File "./shelve-test.py", line 81, in <module>
    _verify_whois_cache()
  File "./shelve-test.py", line 61, in _verify_whois_cache
    for key in db.keys():
  File "/usr/local/lib/python3.8/_collections_abc.py", line 720, in __iter__
    yield from self._mapping
  File "/usr/local/lib/python3.8/shelve.py", line 95, in __iter__
    for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize

I provide a short test program and data that systematically reproduces the bug. I added the a script showing execution messages, the resulting database in DB and text formats.

Tested with Python 3.8.12 on FreeBSD 13.0-RELEASE-p8.
I suppose Python is using my system package db5-5.3.28_8                   (Oracle Berkeley DB, revision 5.3).

See also similar issues:
https://bugs.python.org/issue33074
https://bugs.python.org/issue30388
msg416036 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2022-03-26 05:26
3.8 only gets security patches.  If you can, please test with a newer version.
msg416045 - (view) Author: Hubert Tournier (HubTou) * Date: 2022-03-26 07:09
Hello,
Same results with Python 3.10.4:

[...]
Adding 185.220.102.6
Database has 62 records for 442368 bytes. Last record was 640 bytes long
Traceback (most recent call last):
  File "./shelve-test.py", line 84, in <module>
    _verify_whois_cache()
  File "./shelve-test.py", line 63, in _verify_whois_cache
    for key in db.keys():
  File "/usr/local/lib/python3.10/_collections_abc.py", line 881, in __iter__
    yield from self._mapping
  File "/usr/local/lib/python3.10/shelve.py", line 95, in __iter__
    for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
# freebsd-version -uk
13.0-RELEASE-p8
13.0-RELEASE-p10
# python3.10 --version
Python 3.10.4

The point at which the database breaks depends (from 50 to 500+ records), the size of the database doesn't seem to be relevant (from 400K to 1800K).

The size of the record *apparently* doesn't seem to be relevant (but I'm not 100% sure it's the right figure), though I've had other shelve module uses without issues with many more records but much smaller and less complex.
msg416063 - (view) Author: Hubert Tournier (HubTou) * Date: 2022-03-26 13:38
I modified the test program to better reflect the size of the data structures stored in shelve (sys.getsizeof() which I used was far off the real size).

I saw that the database was corrupted with big records, though even bigger previous records had not corrupted it. Records larger than 1K (mentioned in one of the other problem report) were routinely OK. Records larger than 4K (also mentioned on another PR) were sometimes OK.

When I took a problematic record and used it single alone or with few other records, no corruption occurred.

Any idea?
msg416108 - (view) Author: Hubert Tournier (HubTou) * Date: 2022-03-27 07:16
Additional note: the test code WORKS under Windows 8.1 / Python 3.9.1 (though the data file is suffixed .dat instead of .db) resulting in a 4 MB database with 1065 records, some of them > 11 KB.

So maybe the bug is system dependent.
msg416110 - (view) Author: Hubert Tournier (HubTou) * Date: 2022-03-27 07:56
The storage format used under Windows is completely different from the one used under Unix (or *BSD).

Apart from the .dat datafile, there is a .dir index file with CSV lines such as "'key', (offset, length)".

Whereas under Unix (or *BSD), I have:

# file whois_cache.db
whois_cache.db: Berkeley DB 1.85 (Hash, version 2, native byte-order)

I'll make a test on a Linux Raspberry Pi, to see if the issue is *BSD specific...
msg416119 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2022-03-27 11:54
On 27.03.2022 09:56, Hubert Tournier wrote:
> 
> The storage format used under Windows is completely different from the one used under Unix (or *BSD).

The shelve module uses the dbm module underneath and this will pick
its storage mechanism based on what's available on the platform:

https://docs.python.org/3/library/dbm.html
https://github.com/python/cpython/blob/3.10/Lib/dbm/__init__.py

It's likely that you'll get the dbm.dumb interface on Windows.
On Linux, you typically have one of gdbm or the Berkley DB installed.

dbm.whichdb() will tell you which type of dbm implementation your
files are likely using.

More on the differences of DBM style libs:
http://www.ccl.net/cca/software/UNIX/apache/apacheRH7.0/local-copies/dbm.html

Aside: You are probably better off using SQLite with a pickle
layer to store arbitrary objects. This is much more mature than
the dbm modules.
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91228
2022-03-27 12:01:09Leileisetcomponents: + Demos and Tools, - Library (Lib), FreeBSD
2022-03-27 11:54:07lemburgsetnosy: + lemburg
messages: + msg416119
2022-03-27 07:56:23HubTousetmessages: + msg416110
2022-03-27 07:16:41HubTousetversions: + Python 3.10
nosy: + koobs

messages: + msg416108

components: + FreeBSD
2022-03-26 13:38:21HubTousetfiles: + shelve-test-3.10-b.zip

messages: + msg416063
2022-03-26 07:09:18HubTousetfiles: + shelve-test-3.10.zip

messages: + msg416045
2022-03-26 05:26:32terry.reedysetnosy: + terry.reedy
messages: + msg416036
2022-03-20 17:29:23HubToucreate