classification
Title: optionally make shelve less surprising
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: aleax, dannu, gvanrossum, loewis, rhettinger
Priority: high Keywords: patch

Created on 2002-05-07 08:13 by aleax, last changed 2003-04-19 21:02 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
shelve.patch aleax, 2002-05-07 08:13 patch to Lib/shelve.py
shelve.patch aleax, 2003-04-19 18:38 patch to shelve.py, test_shelve, AND docs
Messages (14)
msg39921 - (view) Author: Alex Martelli (aleax) * (Python committer) Date: 2002-05-07 08:13
shelve has highly surprising behavior wrt modifiable
values:
    s = shelve.open('she.dat','c')
    s['ciao'] = range(3)
    s['ciao'].append(4)   # doesn't "TAKE"!

Explaining to beginners that s['ciao'] is returning a
temporary object and the modification is done on the
temporary thus "silently ignored" is hard indeed.  It
also makes shelve far less convenient than it could 
be (whenever modifiable values must be shelved).

Having s keep track of all values it has returned may
perhaps break some existing program (due to extra 
memory consumption and/or to lack of "implicit 
copy"/"snapshot" behavior) so I've made the 'caching' 
change optional and by default off.  However it's now 
at least possible to obtain nonsurprising behavior:
    s = shelve.open('she.dat','c',smart=1)
    s['ciao'] = range(3)
    s['ciao'].append(4)   # no surprises any more

I suspect the 'smart=1' should be made the default, 
but, if we at least put it in now, then perhaps we 
can migrate to having it as the default very slowly 
and gradually.


Alex
msg39922 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-05-07 16:38
Logged In: YES 
user_id=21627

Even more important than the backwards compatibility might
be the issue that it writes back all accessed objects on
close, which might be expensive if there have been many
read-only accesses.

So I think the option name could be also 'slow'; although
'writeback' might be more technical.

Also, I wonder whether write-back should be attempted if the
shelve was opened read-only.
msg39923 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2002-05-09 18:43
Logged In: YES 
user_id=80475

Nicely done!  The code is clean and runs in the smart mode 
without problems on my existing programs. I agree that the 
patch solves a real world problem.  The solution is clean, 
but a little expensive.

If there were a way to be able to tell if an entry had been 
altered, it would save the 100% writeback.  Unfortunately, 
I can't think of a way.

The docstring could read more smoothly and plainly.  Also, 
it should be clear that the cost of setting smart=1 is that 
100% of the entries get rewritten on close.

Two microscopically minor thoughts on the coding (feel free 
to disregard). Can some of the try/except blocks be 
replaced by something akin to 'if self.smart:'?  For the 
writeback loop, consider 'for k,v in cache.iteritems()' as 
it takes less memory and saves a lookup.
msg39924 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2002-05-09 20:55
Logged In: YES 
user_id=80475

A few more thoughts:

Please change the "except:" lines to specify the exception 
being caught.

Also, if GvR shows interest in the patch, we should update 
the library reference and add unittests.

The docstring should also mention that the cache is kept in 
memory -- besides persistence, one of the forces for 
shelving is memory conservation.
msg39925 - (view) Author: Holger P. Krekel (dannu) Date: 2002-05-10 00:47
Logged In: YES 
user_id=83092

I'd suggest not changing shelve at all but providing 
a "cache-commit" dictionary (ccdict) which can wrap a
shelf-instance (or any other simple dictish instance)
and provides the 'non-surprising' behaviour. 

Some proof of concept code for the following
properties is provided here

http://home.trillke.net/~hpk/ccdict.py

Current properties are:

- ccdict wraps a dictionary-like object which
  in turn only needs to provide
  __getitem__, __setitem__, __delitem__,has_key

- on first access of an element
  ccdict makes a lookup on the underlying
  dict and caches the item.
- the next accesses work with the cached thing.
  Unsurprising dict-semantics are provided.

- deleting an item is deferred and actually happens
  on commit() time. deleting an item and later on
  assigning to it works as expected (i.e. the assignment
  takes preference).

- commit() transfers the items in the
  cache to the underlying dict and clears
  the cache.Prior to issuing commit
  no writeback to the underlying dict happens.

- deleting an ccdict-instance does *not* commit any  
changes. You have to explicitely call commit().
If you want to work readonly, don't call commit.

- clear() only cleares the cache and not the underlying
  dict 

- you can explicitely prune the cache (via cache.keys()
etc.) before calling commit(). This lets you
avoid writing back unmodified objects if this
is an issue.

It seems quite impossible to figure out automagically
which objects have been modified 
and so the solution is to do it explicitely 
(or don't commit for readonly).

holger
msg39926 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-03-30 21:18
Logged In: YES 
user_id=21627

Alex, do you still think this should be implemented, in some
form or  other?
msg39927 - (view) Author: Alex Martelli (aleax) * (Python committer) Date: 2003-03-30 21:26
Logged In: YES 
user_id=60314

Yes, Martin, I'm still quite convinced shelve's behavior is
generally surprising and often problematic, and even though
the fixed suggested by both me and dannu are each imperfect
(given the impossibility to find out, in general, whether an object
has been modified), I think one or the other would still be better
than the current situation.
msg39928 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-03-30 21:43
Logged In: YES 
user_id=21627

Would you then be willing to provide a complete patch
(documentation, NEWS entry, test case)?
msg39929 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-03-30 21:56
Logged In: YES 
user_id=80475

The issue has arisen a couple of times
of comp.lang.python.
I think this patch would be helpful.
msg39930 - (view) Author: Alex Martelli (aleax) * (Python committer) Date: 2003-03-30 21:59
Logged In: YES 
user_id=60314

sure, but along what lines -- my previous patch's, or dannu's?  let 
me know, and I'll get to work on it as soon as I'm back from 
Python-UK & short following trip (i..e around Apr 12)
msg39931 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-03-31 05:15
Logged In: YES 
user_id=21627

dannu's code is currently unavailable... I see no reason to
add yet another layer of indirection, and no other
application of such a wrapper within Python.

The trickiest aspect of this educational: If the default
behaviour does not change (as it shouldn't), how can
unsuspecting users avoid running into the trap? So this is
much more a documentation problem than a code problem.
msg39932 - (view) Author: Alex Martelli (aleax) * (Python committer) Date: 2003-04-19 18:38
Logged In: YES 
user_id=60314

done -- uploading the patch to code, test and docs, with the 
optional parameter renamed to writeback; docs specifying the 
issue, possible workaround w/o writeback, and writeback's 
performance-hit risks; and following raymond's suggestions.
Not sure what "a NEWS entry" as martin requires should be a 
patch TO, though...?  So I didn't do that yet -- lemme know!!!
msg39933 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2003-04-19 18:51
Logged In: YES 
user_id=6380

Looks good to me. Martin or Raymond, can you check it in?
(Or should we just give Alex commit perms?)

One unrelated note: the 'binary' parameter should really be
upgraded to 'protocol' to match changes to the pickle module.
msg39934 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-04-19 21:02
Logged In: YES 
user_id=21627

I have committed the patch as

libshelve.tex 1.20
shelve.py 1.21
test_shelve.py 1.4
NEWS 1.737

In these changes, the binary parameter got upgraded to
protocol; binary was added after writeback for compatibility
and pendingly deprecated. The test case usage of binary was
mainly preserved, and protocol 2 test cases added.

Alex: NEWS is Misc/NEWS; any non-bug fix change is listed there.
History
Date User Action Args
2002-05-07 08:13:04aleaxcreate