classification
Title: Add the iterator protocol to dbm modules
Type: enhancement Stage: resolved
Components: Extension Modules Versions:
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Improve dbm modules
View: 9523
Assigned To: Nosy List: akitada, eric.araujo, foobaron, loewis, rhettinger, ysj.ray
Priority: normal Keywords: patch

Created on 2009-04-11 13:47 by akitada, last changed 2011-02-12 20:17 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
issue5736.diff akitada, 2010-10-16 15:46 iter(dbm.keys()) review
Messages (14)
msg85856 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 13:47
In Python 2.6, dbm modules othar than bsddb don't support the iterator
protocol.

>>> import dbm
>>> d = dbm.open('spam.dbm', 'c')
>>> for k in range(5): d["key%d" % k] = "value%d" % k
... 
>>> for k in d: print k, d[k]
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dbm.dbm' object is not iterable

Adding iterator support would make dbm modules more convenient and
easier to use.
msg85859 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 14:11
Attached is a patch that adds the iterator protocol.
Now it can be interated through like:

>>> for k in d: print k, d[k]
... 
key1 vale1
key3 vale3
key0 vale0
key2 vale2
key4 vale4

The problem is there is no way to get the internal pointer back to the
start. So Once it reached to the end, you are done.

>>> for k in d: print k, d[k]
...

The solution to this would be:
- Add a method to get the pointer back to the start
  (with {first,next}key API)
- Add a method that returns a generator
msg85867 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 16:14
Revised patch adds firstkey and nextkey to dbm.
Now the internal pointer can be reset with firstkey.
msg85878 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-11 22:03
Would you like to fix gdbm as well?
msg85888 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-12 09:47
Here's another patch which addsd iter to dbm and gdbm.

Note that dbm and gdbm C API is a little different.
gdbm_nextkey requires key for its argument, dbm_nextkey don't.
So I had to use for gdbm an static variable that points to the current
position.
Now iterator in gdbm and dbm works differently.

>>> import dbm
>>> d = dbm.open('foo', 'n')
>>> d['k1'] = 'v1';d['k2'] = 'v2';
>>> for i in d: print i; break
... 
k1
>>> for i in d: print i
... 
k2
>>> for i in d: print i
... 


>>> import gdbm
>>> gd = gdbm.open('foo.gdbm', 'n')
>>> gd['k1'] = 'v1';gd['k2'] = 'v2';
>>> for i in gd: print i; break
... 
k2
>>> for i in gd: print i
for i in gd: print i
... 
k1
>>> for i in gd: print i
... 
k2
k1
msg85889 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-12 10:11
Of course iter should work in the same way in all dbm modules.
iter in dbm/gdbm should work like dumbdbm's iter.

>>> dumb = dumbdbm.open('foo', 'n')
>>> dumb['k1'] = 'v1';dumb['k2'] = 'v2';
>>> for i in dumb: print i; break
... 
k2
>>> for i in dumb: print i
for i in dumb: print i
... 
k2
k1
>>> for i in dumb: print i
for i in dumb: print i
... 
k2
k1
msg85928 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-04-12 23:45
Akira> Note that dbm and gdbm C API is a little different.  gdbm_nextkey
    Akira> requires key for its argument, dbm_nextkey don't.  So I had to
    Akira> use for gdbm an static variable that points to the current
    Akira> position.

I don't think this is going to fly.  A static variable is not thread-safe.
What's worse, even in a non-threaded environment you might want to iterate
over the gdbm file simultaneously from two different places.
msg85931 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-04-13 00:19
skip> What's worse, even in a non-threaded environment you might want to
    skip> iterate over the gdbm file simultaneously from two different
    skip> places.

Or iterate over two different gdbm files simultaneously.
msg85944 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-13 12:56
I agree with Skip that using a static variable is not appropriate. The
proper solution probably would be to define a separate gdbm_iter object
which always preserves the last key returned.
msg85946 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-13 13:43
Yes, using a static variable there is wrong and
actually I'm now working on "dbm_iterobject" just as Martin suggested.

dbm iterator should behave just like one in dict.
I think I can use Objects/dictobject.c as a good example for this.

Attached is minimal tests for dbm iterator.
msg91339 - (view) Author: Christopher Lee (foobaron) Date: 2009-08-06 00:32
Another reason this issue is really important, is that the lack of a
consistent iter() interface for dbm.* makes shelve iteration not
scalable; i.e. trying to iterate on a Shelf will run self.dict.keys() to
load the entire index into memory.  This seems contrary to a primary
purpose of shelve, namely to store the index on-disk so as to avoid
having to keep the whole index in memory.  

I suspect that for most users, shelve is the main way they will access
the dbm.* interfaces.  Therefore, fixing the dbm.* interfaces so that
shelve is scalable seems like an important need.

Once dbm and gdbm support the iterator protocol, it will be trivial to
add an __iter__() method to shelve.Shelf, that simply returns
iter(self.dict).
msg118874 - (view) Author: Akira Kitada (akitada) * Date: 2010-10-16 15:46
This patch just uses PyObject_GetIter to get an iter.
(I just copied the idea from issue9523)
msg123358 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-12-04 15:20
This may be superseded by #9523.  There are comments and patches in both issues, so I’m not closing either as duplicate of the other.
msg128465 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-12 20:17
#9523 has a more comprehensive patch in progress, adding __iter__ and other mapping methods, so I’m closing this one.
History
Date User Action Args
2011-02-12 20:17:43eric.araujosetstatus: open -> closed

superseder: Improve dbm modules
versions: - Python 3.2
nosy: loewis, rhettinger, eric.araujo, akitada, foobaron, ysj.ray
messages: + msg128465
resolution: duplicate
stage: resolved
2010-12-04 15:20:59eric.araujosetnosy: + eric.araujo
messages: + msg123358
2010-10-18 11:41:18pitrousetnosy: + rhettinger
2010-10-16 15:47:01akitadasetfiles: + issue5736.diff
versions: + Python 3.2, - Python 2.7
nosy: + ysj.ray

messages: + msg118874
2010-10-16 15:32:10akitadasetfiles: - issue5736.diff
2010-10-16 15:32:06akitadasetfiles: - test_issue5736.diff
2010-10-16 15:32:02akitadasetfiles: - issue5736.diff
2010-10-16 15:31:52akitadasetfiles: - issue5736.diff
2010-05-20 20:31:03skip.montanarosetnosy: - skip.montanaro
2009-08-06 00:32:36foobaronsetnosy: + foobaron
messages: + msg91339
2009-04-13 13:43:40akitadasetfiles: + test_issue5736.diff

messages: + msg85946
2009-04-13 12:56:08loewissetmessages: + msg85944
2009-04-13 00:19:22skip.montanarosetmessages: + msg85931
2009-04-12 23:45:07skip.montanarosetnosy: + skip.montanaro
messages: + msg85928
2009-04-12 10:11:41akitadasetmessages: + msg85889
2009-04-12 09:47:38akitadasetfiles: + issue5736.diff

messages: + msg85888
2009-04-11 22:03:38loewissetnosy: + loewis
messages: + msg85878
2009-04-11 16:14:07akitadasetfiles: + issue5736.diff

messages: + msg85867
2009-04-11 14:11:27akitadasetfiles: + issue5736.diff
keywords: + patch
messages: + msg85859
2009-04-11 13:47:52akitadacreate