New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbm corrupts index on macOS (_dbm module) #77255
Comments
Environment: Python 3.6.4, macOS 10.12.6 Python 3's dbm appears to corrupt the key index on macOS if objects >4KB are inserted. Code: <<<<<<<<<<< with contextlib.closing(dbm.open('test', 'n')) as db:
for k in range(128):
db[('%04d' % k).encode()] = b'\0' * (k * 128)
with contextlib.closing(dbm.open('test', 'r')) as db:
print(len(db))
print(len(list(db.keys())))
>>>>>>>>>>> On my machine, I get the following: <<<<<<<<<<<
94
Traceback (most recent call last):
File "test.py", line 10, in <module>
print(len(list(db.keys())))
SystemError: Negative size passed to PyBytes_FromStringAndSize
>>>>>>>>>>> (The error says PyString_FromStringAndSize on Python 2.x but is otherwise the same). The expected output, which I see on Linux (using gdbm), is 128 I get this error with the following Pythons on my system: /usr/bin/python2.6 - Apple-supplied Python 2.6.9 This seems like a very big problem - silent data corruption with no warning. It appears related to bpo-30388, but in that case they were seeing sporadic failures. The deterministic script above causes failures in every case. This was discovered after running some code which used shelve (which uses dbm under the hood) in Python 3, but the bug clearly applies to Python 2 as well. |
(Note: the contextlib stuff is just for Python 2 compatibility, it's not necessary on Python 3). |
I highly suspect you don't have gdbm installed in your environment and |
So we have some other problems then: (1) It should be documented in dbm, and ideally in shelve, that keys/values over a certain limit might not work. Presently there is no hint that such a limit exists, and until you mentioned it I was unaware that POSIX only required 1023-byte keys and values. Thoughts? |
Addressing your point (5):
If you are using MacPorts, the easiest way is to use a Python from MacPorts. For example,
The main problem is that gdbm is GPL3 licensed. Python source distributions do not include any GPL3-licensed software to avoid tainting Python itself. We therefore avoid shipping GPL3 software with python.org binary releases, like our macOS installers. |
I just started a new project, thoughtlessly decided to use To reiterate: Although At the very least there should be a warning or error that the data inserted exceeds dbm's limits, and in an ideal world dbm would not fall over from inserting a few KB of data in a single row (but I understand that's a third party problem at that point). Can't we just ship a dbm that is backed with a more robust engine, like a SQLite key-value table? |
This is a duplicate of bpo-30388 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: