Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbm.dumb should be consistent when the database is closed #63584

Closed
PCManticore mannequin opened this issue Oct 25, 2013 · 29 comments
Closed

dbm.dumb should be consistent when the database is closed #63584

PCManticore mannequin opened this issue Oct 25, 2013 · 29 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@PCManticore
Copy link
Mannequin

PCManticore mannequin commented Oct 25, 2013

BPO 19385
Nosy @ncoghlan, @PCManticore, @JimJJewett, @serhiy-storchaka
Files
  • dbm_dumb.patch
  • dbm_dumb1.patch
  • dbm_dumb1.patch: add space in test
  • issue19385.patch
  • issue19385_1.patch
  • issue19385_speed.patch
  • issue19385_speed_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2014-05-28.15:53:06.063>
    created_at = <Date 2013-10-25.06:38:41.007>
    labels = ['type-feature', 'library']
    title = 'dbm.dumb should be consistent when the database is closed'
    updated_at = <Date 2014-05-28.15:53:06.062>
    user = 'https://github.com/PCManticore'

    bugs.python.org fields:

    activity = <Date 2014-05-28.15:53:06.062>
    actor = 'serhiy.storchaka'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2014-05-28.15:53:06.063>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-10-25.06:38:41.007>
    creator = 'Claudiu.Popa'
    dependencies = []
    files = ['32347', '32449', '32453', '34337', '35016', '35241', '35348']
    hgrepos = []
    issue_num = 19385
    keywords = ['patch']
    message_count = 29.0
    messages = ['201209', '201819', '201823', '201824', '201900', '206050', '206058', '206069', '206094', '206151', '206183', '206184', '206185', '206416', '213069', '213840', '217091', '217093', '217097', '217116', '217216', '218435', '218440', '218441', '218444', '219083', '219277', '219281', '219283']
    nosy_count = 6.0
    nosy_names = ['ncoghlan', 'Arfrever', 'Claudiu.Popa', 'python-dev', 'Jim.Jewett', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue19385'
    versions = ['Python 3.4', 'Python 3.5']

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Oct 25, 2013

    This problem occurred in bpo-19282. Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following:

    >>> import dbm.dumb as d
    >>> db = d.open('test.dat', 'c')
    >>> db.close()
    >>> db.keys()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys
        return list(self._index.keys())
    AttributeError: 'NoneType' object has no attribute 'keys'
    >>>

    vs

    >>> import dbm.gnu as g
    >>> db = g.open('test.dat', 'c')
    >>> db.close()
    >>> db.keys()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _gdbm.error: GDBM object has already been closed
    >>>

    Attaching a patch.

    @PCManticore PCManticore mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 25, 2013
    @ncoghlan
    Copy link
    Contributor

    Patch looks good to me, I'll apply it when I apply bpo-19282.

    @ncoghlan ncoghlan self-assigned this Oct 31, 2013
    @serhiy-storchaka
    Copy link
    Member

    Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception.

    @serhiy-storchaka
    Copy link
    Member

    Yet one nitpick. I think that closing check should be after argument type check and key encoding.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Nov 1, 2013

    Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, items will probably look like this:

    try:
    return [(key, self[key]) for key in self._index.keys()]
    except AttributeError:
    raise dbm.dumb.error(...) from None

    but will look totally different for other __len__:

    try:
    return len(self._index)
    except TypeError:
    raise dbm.dumb.error(...) from None.

    We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Dec 13, 2013

    Serhiy, what's the status for this? Should this be targeted for 3.5?

    @serhiy-storchaka
    Copy link
    Member

    Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better).

    Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results?

    The patch LGTM.

    Should this be targeted for 3.5?

    I think yes.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Dec 13, 2013

    Here's a benchmark:

    • with current patch:

    # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
    10000 loops, best of 3: 1.78 usec per loop

    • using
      try:
      return len(self._index)
      except TypeError:
      raise error('...')

    # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
    10000 loops, best of 3: 3.27 usec per loop

    Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Dec 13, 2013

    IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.

    @serhiy-storchaka
    Copy link
    Member

    Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None.

    Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database.

    IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.

    Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick?

    @ncoghlan
    Copy link
    Contributor

    I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Dec 14, 2013

    in the last ever 3.3 release.

    msg204533 suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0).

    @ncoghlan
    Copy link
    Contributor

    That's slightly more acceptable, but it's still hard to make the case
    that changing exception types is a reasonable thing to do in a
    maintenance release (it's only the specific nature of the error that
    makes me consider it reasonable even in a feature release).

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Dec 17, 2013

    If we agree that this should be fixed in 3.4, can somebody commit it before the second beta?

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Mar 10, 2014

    Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d).

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Mar 17, 2014

    Can this patch be committed, now that 3.5 is active?

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 23, 2014

    _check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Apr 23, 2014

    gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 23, 2014

    I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before?

    Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented Apr 24, 2014

    On my machine I get the following results for the unclosed-database case.

    With patch:

    # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
    100000 loops, best of 3: 0.0638 usec per loop

    Without patch:

    # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
    100000 loops, best of 3: 0.0634 usec per loop

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2014

    New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4':
    make operations on closed dumb databases raise a consistent exception (closes bpo-19385)
    http://hg.python.org/cpython/rev/e8343cb98cc3

    New changeset dbceba88b96e by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-19385)
    http://hg.python.org/cpython/rev/dbceba88b96e

    @python-dev python-dev mannequin closed this as completed Apr 26, 2014
    @serhiy-storchaka
    Copy link
    Member

    Claudiu, your benchmark is broken, it measure a no-op.

    Correct benchmark:

        $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"

    3.4: 100000 loops, best of 3: 2.56 usec per loop
    3.5: 100000 loops, best of 3: 4.17 usec per loop

    There is 60% regression.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented May 13, 2014

    Did you try 3.5 without the patch, in case the issue is not with his code?
    On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote:

    Serhiy Storchaka added the comment:

    Claudiu, your benchmark is broken, it measure a no-op.

    Correct benchmark:

    $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm;
    

    d=dbm.open('x.dat', 'c')" "len(d)"

    3.4: 100000 loops, best of 3: 2.56 usec per loop
    3.5: 100000 loops, best of 3: 4.17 usec per loop

    There is 60% regression.

    ----------
    status: closed -> open


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue19385\>


    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented May 13, 2014

    Right, my benchmark was indeed flawed. Here are the new results on my machine:

    Without the patch
    # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
    100000 loops, best of 3: 0.564 usec per loop

    With the patch
    # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
    100000 loops, best of 3: 0.857 usec per loop

    Even having an empty _verify_open in __len__ method leads to this:

    # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
    100000 loops, best of 3: 0.749 usec per loop

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented May 13, 2014

    Here's a new patch which uses the EAFP approach for dunder methods (len, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch.

    @serhiy-storchaka
    Copy link
    Member

    There is no need to speed up methods which do IO (getitem, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases.

    @PCManticore
    Copy link
    Mannequin Author

    PCManticore mannequin commented May 28, 2014

    Looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 28, 2014

    New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4':
    Restore performance of some dumb database methods (regression introduced by bpo-19385).
    http://hg.python.org/cpython/rev/95207bcd8298

    New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default':
    Restore performance of some dumb database methods (regression introduced by bpo-19385).
    http://hg.python.org/cpython/rev/2e59e0b579e5

    @serhiy-storchaka
    Copy link
    Member

    Thanks Claudiu.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants