This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Shelve consistency issues
Type: enhancement Stage:
Components: Documentation Versions: Python 3.6, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Yanyan Jiang, docs@python, r.david.murray
Priority: normal Keywords:

Created on 2015-10-19 19:52 by Yanyan Jiang, last changed 2022-04-11 14:58 by admin.

Messages (4)
msg253188 - (view) Author: Yanyan Jiang (Yanyan Jiang) Date: 2015-10-19 19:52
I am currently working on the file system reliability issues. I have a disk driver that is able to simulate crash disk sites after injected power failures. This disk is totally compatible with the Linux block driver semantics (refer to  https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt), and may create many crash sites that pending blocks are partially flushed into the disk which is a common behavior of a commodity disk with write buffer.

Our automated tool confirms the corruptions could happen on a crash site at an unclean shutdown (Linux with default ext4 setting). We also found that there are some discussions on [Stackoverflow](http://stackoverflow.com/questions/4226580/prevent-python-shelve-corruption) concerning this issue. I am suggesting to explicitly remind the developers of such behaviors.

Suggested documentation enhancement
--------------------------------------
As a minimal database library, `shelve` does not offer as strong ACID (atomicity, consistency, isolation and durability) guarantee as a database (like SQLite). On certain system configurations, a system crash would lead to a corrupted shelve file. If you are using shelve to persistent precious data like user's document, we suggest using the following steps to ensure data is not lost:

1. Create a copy of the file, say, the temporary.
2. Operate on a copy of the temporary file. Closing a shelve db implies data to be flushed to the disk.
3. Rename the temporary file to replace the original file. Renaming is carefully treated by a journaled filesystem to be atomic.
msg253195 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-19 21:21
Shelve does not itself implement any database, but it does *use* a database[*].  Any aspects of this must be directed toward the underlying database library used.  In particular, it is not part of the shelve API to know anything about any possible underlying file or files, nor is it *necessarily* the case that there is pending data to be flushed on close.

So, if you want to suggest a documentation enhancement, it should to make reference to the issue and point the user at the documentation for the underlying database they choose to use for more information.

[*] There is an open issue proposing an sqlite backend for shelve, but no one so far has had the motivation to finish it.
msg253200 - (view) Author: Yanyan Jiang (Yanyan Jiang) Date: 2015-10-20 00:58
Thanks for reminding. It is originally reported with the default setting. We conducted further tests with other options of anydbm (dbhash, dbm, gdbm), none of them survived crash testing. For the detailed reasoning please refer to an OSDI'14 research paper: https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf
 This paper discussed vulnerabilities of GDBM implementation in that paper, and these lightweight db implementations have similar problems. We also have tests SQLite, and it is much more robust that we have not found ACID violation yet.

Personally I think it is reasonable to have an SQLite backend, as it is much safer (plus providing thread safety). Just to see what I can do for that.
msg253215 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-10-20 11:56
Yeah, if we had an sqlite backend I think we'd make it the default if sqlite was available.  There's a proof of concept implementation in the open issue 3783.  I'm not sure what remains to be done (other than docs)...I didn't read through the issue and there's a fair bit of discussion.
History
Date User Action Args
2022-04-11 14:58:22adminsetgithub: 69628
2015-10-20 11:56:04r.david.murraysetmessages: + msg253215
2015-10-20 00:58:07Yanyan Jiangsetmessages: + msg253200
2015-10-19 21:21:30r.david.murraysetnosy: + r.david.murray
messages: + msg253195
2015-10-19 19:52:28Yanyan Jiangcreate