Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the iterator protocol to dbm modules #49986

Closed
akitada mannequin opened this issue Apr 11, 2009 · 14 comments
Closed

Add the iterator protocol to dbm modules #49986

akitada mannequin opened this issue Apr 11, 2009 · 14 comments
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@akitada
Copy link
Mannequin

akitada mannequin commented Apr 11, 2009

BPO 5736
Nosy @loewis, @rhettinger, @merwok, @akitada
Superseder
  • bpo-9523: Improve dbm modules
  • Files
  • issue5736.diff: iter(dbm.keys())
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-02-12.20:17:43.215>
    created_at = <Date 2009-04-11.13:47:52.382>
    labels = ['extension-modules', 'type-feature']
    title = 'Add the iterator protocol to dbm modules'
    updated_at = <Date 2011-02-12.20:17:43.213>
    user = 'https://github.com/akitada'

    bugs.python.org fields:

    activity = <Date 2011-02-12.20:17:43.213>
    actor = 'eric.araujo'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-02-12.20:17:43.215>
    closer = 'eric.araujo'
    components = ['Extension Modules']
    creation = <Date 2009-04-11.13:47:52.382>
    creator = 'akitada'
    dependencies = []
    files = ['19250']
    hgrepos = []
    issue_num = 5736
    keywords = ['patch']
    message_count = 14.0
    messages = ['85856', '85859', '85867', '85878', '85888', '85889', '85928', '85931', '85944', '85946', '91339', '118874', '123358', '128465']
    nosy_count = 6.0
    nosy_names = ['loewis', 'rhettinger', 'eric.araujo', 'akitada', 'foobaron', 'ysj.ray']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '9523'
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5736'
    versions = []

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 11, 2009

    In Python 2.6, dbm modules othar than bsddb don't support the iterator
    protocol.

    >>> import dbm
    >>> d = dbm.open('spam.dbm', 'c')
    >>> for k in range(5): d["key%d" % k] = "value%d" % k
    ... 
    >>> for k in d: print k, d[k]
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'dbm.dbm' object is not iterable

    Adding iterator support would make dbm modules more convenient and
    easier to use.

    @akitada akitada mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Apr 11, 2009
    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 11, 2009

    Attached is a patch that adds the iterator protocol.
    Now it can be interated through like:

    >>> for k in d: print k, d[k]
    ... 
    key1 vale1
    key3 vale3
    key0 vale0
    key2 vale2
    key4 vale4

    The problem is there is no way to get the internal pointer back to the
    start. So Once it reached to the end, you are done.

    >>> for k in d: print k, d[k]
    ...

    The solution to this would be:

    • Add a method to get the pointer back to the start
      (with {first,next}key API)
    • Add a method that returns a generator

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 11, 2009

    Revised patch adds firstkey and nextkey to dbm.
    Now the internal pointer can be reset with firstkey.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 11, 2009

    Would you like to fix gdbm as well?

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 12, 2009

    Here's another patch which addsd iter to dbm and gdbm.

    Note that dbm and gdbm C API is a little different.
    gdbm_nextkey requires key for its argument, dbm_nextkey don't.
    So I had to use for gdbm an static variable that points to the current
    position.
    Now iterator in gdbm and dbm works differently.

    >>> import dbm
    >>> d = dbm.open('foo', 'n')
    >>> d['k1'] = 'v1';d['k2'] = 'v2';
    >>> for i in d: print i; break
    ... 
    k1
    >>> for i in d: print i
    ... 
    k2
    >>> for i in d: print i
    ... 
    
    
    >>> import gdbm
    >>> gd = gdbm.open('foo.gdbm', 'n')
    >>> gd['k1'] = 'v1';gd['k2'] = 'v2';
    >>> for i in gd: print i; break
    ... 
    k2
    >>> for i in gd: print i
    for i in gd: print i
    ... 
    k1
    >>> for i in gd: print i
    ... 
    k2
    k1

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 12, 2009

    Of course iter should work in the same way in all dbm modules.
    iter in dbm/gdbm should work like dumbdbm's iter.

    >>> dumb = dumbdbm.open('foo', 'n')
    >>> dumb['k1'] = 'v1';dumb['k2'] = 'v2';
    >>> for i in dumb: print i; break
    ... 
    k2
    >>> for i in dumb: print i
    for i in dumb: print i
    ... 
    k2
    k1
    >>> for i in dumb: print i
    for i in dumb: print i
    ... 
    k2
    k1

    @smontanaro
    Copy link
    Contributor

    Akira> Note that dbm and gdbm C API is a little different. gdbm_nextkey
    Akira> requires key for its argument, dbm_nextkey don't. So I had to
    Akira> use for gdbm an static variable that points to the current
    Akira> position.

    I don't think this is going to fly. A static variable is not thread-safe.
    What's worse, even in a non-threaded environment you might want to iterate
    over the gdbm file simultaneously from two different places.

    @smontanaro
    Copy link
    Contributor

    skip> What's worse, even in a non-threaded environment you might want to
    skip> iterate over the gdbm file simultaneously from two different
    skip> places.

    Or iterate over two different gdbm files simultaneously.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 13, 2009

    I agree with Skip that using a static variable is not appropriate. The
    proper solution probably would be to define a separate gdbm_iter object
    which always preserves the last key returned.

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Apr 13, 2009

    Yes, using a static variable there is wrong and
    actually I'm now working on "dbm_iterobject" just as Martin suggested.

    dbm iterator should behave just like one in dict.
    I think I can use Objects/dictobject.c as a good example for this.

    Attached is minimal tests for dbm iterator.

    @foobaron
    Copy link
    Mannequin

    foobaron mannequin commented Aug 6, 2009

    Another reason this issue is really important, is that the lack of a
    consistent iter() interface for dbm.* makes shelve iteration not
    scalable; i.e. trying to iterate on a Shelf will run self.dict.keys() to
    load the entire index into memory. This seems contrary to a primary
    purpose of shelve, namely to store the index on-disk so as to avoid
    having to keep the whole index in memory.

    I suspect that for most users, shelve is the main way they will access
    the dbm.* interfaces. Therefore, fixing the dbm.* interfaces so that
    shelve is scalable seems like an important need.

    Once dbm and gdbm support the iterator protocol, it will be trivial to
    add an __iter__() method to shelve.Shelf, that simply returns
    iter(self.dict).

    @akitada
    Copy link
    Mannequin Author

    akitada mannequin commented Oct 16, 2010

    This patch just uses PyObject_GetIter to get an iter.
    (I just copied the idea from bpo-9523)

    @merwok
    Copy link
    Member

    merwok commented Dec 4, 2010

    This may be superseded by bpo-9523. There are comments and patches in both issues, so I’m not closing either as duplicate of the other.

    @merwok
    Copy link
    Member

    merwok commented Feb 12, 2011

    bpo-9523 has a more comprehensive patch in progress, adding __iter__ and other mapping methods, so I’m closing this one.

    @merwok merwok closed this as completed Feb 12, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants