classification
Title: [PATCH]Add FastDbfilenameShelf: shelf nerver sync cache even when writeback=True
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.1
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: aleax, lehmannro, loewis, r.david.murray, zhigang
Priority: normal Keywords: patch

Created on 2009-03-13 13:12 by zhigang, last changed 2010-02-11 03:32 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
fast_shelf.patch zhigang, 2009-03-13 13:12 Patch to add FastDbfilenameShelf
fast_shelf-v2.patch zhigang, 2009-03-13 13:20 Errated patch to also add the new class to __all__
Messages (6)
msg83516 - (view) Author: Zhigang Wang (zhigang) Date: 2009-03-13 13:12
shelf without writeback=True is too slow, while shelves with
writeback=True takes too much time to close. And even worse, these codes
can not run:

$ cat test_shelve.py
#!/usr/bin/env python

import shelve

store = shelve.open("/tmp/shelve.db", writeback=True)

class Test(object):
    pass

def main():
    store["a"] = Test()

if __name__ == '__main__':
    main()

$ python test_shelve.py 
Exception cPickle.PicklingError: "Can't pickle <class '__main__.Test'>:
it's not the same object as __main__.Test" in  ignored

With this module, we can make it to run.

I think it's worth add this function to shelve. We can achieve great
improvement with some avoidable limitations.

The other approach to add this function is to add a extra option the
shelve.open(). We can discuss for which is better.
msg83517 - (view) Author: Zhigang Wang (zhigang) Date: 2009-03-13 13:20
Add some errata of the patch: add the new class to __all__.
msg92703 - (view) Author: Robert Lehmann (lehmannro) * Date: 2009-09-16 17:33
If I understand you correctly, your proposal is the following: use
Shelf.cache to cache *all* objects instead of only keeping live
references. Your patch retains the cache forever instead of purging it
on sync. (All these changes only apply with writeback=True, which you
enabled by default; nothing changes with writeback=False.)
This speeds up *repeated* reads/writes as they are kept in-memory
instead of querying the-- probably slow --database every time.

I do not think this is a feasible solution for two reasons:
(1) writeback was never intended to do such thing. It was introduced as
a solution to "make shelve less surprising." If you remove its
sync-on-close characteristics, shelve is as surprising as before. See
issue553171.
(2) If you intend to implement caching on *any* database I'd suggest not
using shelve for that matter. Using it for serialization is all okay but
I'd rather add an additional layer of indirection to implement caching
strategies if you want to do that The Right Way. (Shelf.cache is really
only a map of objects that were touched during runtime.)

I'm -1 on this patch but generally +1 on a generic caching wrapper.

The error you describe later on appears because Python is already
tearing down when gc'ing the Shelf (calling __del__ -> close -> sync).
With writeback=True this currently tries to pickle the cache again which
emits the error you observed. This should be handled in a separate issue.
msg92740 - (view) Author: Zhigang Wang (zhigang) Date: 2009-09-17 05:53
Thanks Robert for pointing out issue553171. 

After read that issue, I still think we paid too much to make shelf less
surprising.

We should at lease provide a option for the *smart* programmers to get
better speed and less exceptions.

The write-back-all-cache-on-close feature is easy to document, but hard
to accept when you have cached too much data.

CCing Alex and Martin for comments.
msg92774 - (view) Author: Robert Lehmann (lehmannro) * Date: 2009-09-17 15:58
I addressed the other bug you were experiencing in issue6932.
msg99195 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-02-11 03:32
I agree that caching for speed is something that should be implemented in another layer.  It certainly is orthogonal to the writeback issue.  The best caching strategy is going to depend on the application, so I don't think caching for speed belongs in Shelve itself.  Closing as rejected.
History
Date User Action Args
2010-02-11 03:32:38r.david.murraysetstatus: open -> closed
priority: normal


nosy: + r.david.murray
messages: + msg99195
resolution: rejected
stage: resolved
2009-09-17 15:58:26lehmannrosetmessages: + msg92774
2009-09-17 05:53:22zhigangsetnosy: + loewis, aleax
messages: + msg92740
2009-09-16 17:33:49lehmannrosetnosy: + lehmannro
messages: + msg92703
2009-03-13 13:20:51zhigangsetfiles: + fast_shelf-v2.patch

messages: + msg83517
2009-03-13 13:12:04zhigangcreate