dbm.dumb should be consistent when the database is closed #63584

PCManticore · 2013-10-25T06:38:41Z

BPO	19385
Nosy	@ncoghlan, @PCManticore, @JimJJewett, @serhiy-storchaka
Files	dbm_dumb.patch dbm_dumb1.patch dbm_dumb1.patch: add space in test issue19385.patch issue19385_1.patch issue19385_speed.patch issue19385_speed_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ncoghlan'
closed_at = <Date 2014-05-28.15:53:06.063>
created_at = <Date 2013-10-25.06:38:41.007>
labels = ['type-feature', 'library']
title = 'dbm.dumb should be consistent when the database is closed'
updated_at = <Date 2014-05-28.15:53:06.062>
user = 'https://github.com/PCManticore'

bugs.python.org fields:

activity = <Date 2014-05-28.15:53:06.062>
actor = 'serhiy.storchaka'
assignee = 'ncoghlan'
closed = True
closed_date = <Date 2014-05-28.15:53:06.063>
closer = 'serhiy.storchaka'
components = ['Library (Lib)']
creation = <Date 2013-10-25.06:38:41.007>
creator = 'Claudiu.Popa'
dependencies = []
files = ['32347', '32449', '32453', '34337', '35016', '35241', '35348']
hgrepos = []
issue_num = 19385
keywords = ['patch']
message_count = 29.0
messages = ['201209', '201819', '201823', '201824', '201900', '206050', '206058', '206069', '206094', '206151', '206183', '206184', '206185', '206416', '213069', '213840', '217091', '217093', '217097', '217116', '217216', '218435', '218440', '218441', '218444', '219083', '219277', '219281', '219283']
nosy_count = 6.0
nosy_names = ['ncoghlan', 'Arfrever', 'Claudiu.Popa', 'python-dev', 'Jim.Jewett', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19385'
versions = ['Python 3.4', 'Python 3.5']

PCManticore · 2013-10-25T06:38:41Z

This problem occurred in bpo-19282. Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following:

>>> import dbm.dumb as d
>>> db = d.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys
    return list(self._index.keys())
AttributeError: 'NoneType' object has no attribute 'keys'
>>>

vs

>>> import dbm.gnu as g
>>> db = g.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_gdbm.error: GDBM object has already been closed
>>>

Attaching a patch.

ncoghlan · 2013-10-31T14:36:39Z

Patch looks good to me, I'll apply it when I apply bpo-19282.

serhiy-storchaka · 2013-10-31T15:13:47Z

Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception.

serhiy-storchaka · 2013-10-31T15:16:25Z

Yet one nitpick. I think that closing check should be after argument type check and key encoding.

PCManticore · 2013-11-01T14:06:23Z

Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, items will probably look like this:

try:
return [(key, self[key]) for key in self._index.keys()]
except AttributeError:
raise dbm.dumb.error(...) from None

but will look totally different for other __len__:

try:
return len(self._index)
except TypeError:
raise dbm.dumb.error(...) from None.

We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access.

PCManticore · 2013-12-13T10:34:28Z

Serhiy, what's the status for this? Should this be targeted for 3.5?

serhiy-storchaka · 2013-12-13T11:09:14Z

Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better).

Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results?

The patch LGTM.

Should this be targeted for 3.5?

I think yes.

PCManticore · 2013-12-13T12:40:41Z

Here's a benchmark:

with current patch:

# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
10000 loops, best of 3: 1.78 usec per loop

using
try:
return len(self._index)
except TypeError:
raise error('...')

# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
10000 loops, best of 3: 3.27 usec per loop

Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical.

Arfrever · 2013-12-13T15:07:04Z

IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.

serhiy-storchaka · 2013-12-13T21:53:26Z

Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None.

Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database.

IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.

Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick?

ncoghlan · 2013-12-14T13:28:35Z

I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice.

Arfrever · 2013-12-14T13:35:26Z

in the last ever 3.3 release.

msg204533 suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0).

ncoghlan · 2013-12-14T13:51:36Z

That's slightly more acceptable, but it's still hard to make the case
that changing exception types is a reasonable thing to do in a
maintenance release (it's only the specific nature of the error that
makes me consider it reasonable even in a feature release).

PCManticore · 2013-12-17T10:43:23Z

If we agree that this should be fixed in 3.4, can somebody commit it before the second beta?

PCManticore · 2014-03-10T18:58:37Z

Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d).

PCManticore · 2014-03-17T07:15:49Z

Can this patch be committed, now that 3.5 is active?

jimjjewett · 2014-04-23T21:56:53Z

_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open.

PCManticore · 2014-04-23T22:01:17Z

gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch.

jimjjewett · 2014-04-23T22:38:07Z

I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before?

Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb.

PCManticore · 2014-04-24T05:11:38Z

On my machine I get the following results for the unclosed-database case.

With patch:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0638 usec per loop

Without patch:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0634 usec per loop

python-dev · 2014-04-26T20:57:20Z

New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4':
make operations on closed dumb databases raise a consistent exception (closes bpo-19385)
http://hg.python.org/cpython/rev/e8343cb98cc3

New changeset dbceba88b96e by Benjamin Peterson in branch 'default':
merge 3.4 (bpo-19385)
http://hg.python.org/cpython/rev/dbceba88b96e

serhiy-storchaka · 2014-05-13T11:23:09Z

Claudiu, your benchmark is broken, it measure a no-op.

Correct benchmark:

    $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"

3.4: 100000 loops, best of 3: 2.56 usec per loop
3.5: 100000 loops, best of 3: 4.17 usec per loop

There is 60% regression.

jimjjewett · 2014-05-13T11:39:25Z

Did you try 3.5 without the patch, in case the issue is not with his code?
On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote:

Serhiy Storchaka added the comment:

Claudiu, your benchmark is broken, it measure a no-op.

Correct benchmark:
$ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm;
d=dbm.open('x.dat', 'c')" "len(d)"

3.4: 100000 loops, best of 3: 2.56 usec per loop
3.5: 100000 loops, best of 3: 4.17 usec per loop

There is 60% regression.

----------
status: closed -> open

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue19385\>

PCManticore · 2014-05-13T11:42:40Z

Right, my benchmark was indeed flawed. Here are the new results on my machine:

Without the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.564 usec per loop

With the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.857 usec per loop

Even having an empty _verify_open in __len__ method leads to this:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.749 usec per loop

PCManticore · 2014-05-13T12:10:23Z

Here's a new patch which uses the EAFP approach for dunder methods (len, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch.

serhiy-storchaka · 2014-05-25T11:28:53Z

There is no need to speed up methods which do IO (getitem, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases.

PCManticore · 2014-05-28T15:27:46Z

Looks good to me.

python-dev · 2014-05-28T15:51:02Z

New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4':
Restore performance of some dumb database methods (regression introduced by bpo-19385).
http://hg.python.org/cpython/rev/95207bcd8298

New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default':
Restore performance of some dumb database methods (regression introduced by bpo-19385).
http://hg.python.org/cpython/rev/2e59e0b579e5

serhiy-storchaka · 2014-05-28T15:53:06Z

Thanks Claudiu.

PCManticore mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 25, 2013

ncoghlan self-assigned this Oct 31, 2013

python-dev mannequin closed this as completed Apr 26, 2014

serhiy-storchaka reopened this May 13, 2014

serhiy-storchaka closed this as completed May 28, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbm.dumb should be consistent when the database is closed #63584

dbm.dumb should be consistent when the database is closed #63584

PCManticore mannequin commented Oct 25, 2013

PCManticore mannequin commented Oct 25, 2013

ncoghlan commented Oct 31, 2013

serhiy-storchaka commented Oct 31, 2013

serhiy-storchaka commented Oct 31, 2013

PCManticore mannequin commented Nov 1, 2013

PCManticore mannequin commented Dec 13, 2013

serhiy-storchaka commented Dec 13, 2013

PCManticore mannequin commented Dec 13, 2013

Arfrever mannequin commented Dec 13, 2013

serhiy-storchaka commented Dec 13, 2013

ncoghlan commented Dec 14, 2013

Arfrever mannequin commented Dec 14, 2013

ncoghlan commented Dec 14, 2013

PCManticore mannequin commented Dec 17, 2013

PCManticore mannequin commented Mar 10, 2014

PCManticore mannequin commented Mar 17, 2014

jimjjewett mannequin commented Apr 23, 2014

PCManticore mannequin commented Apr 23, 2014

jimjjewett mannequin commented Apr 23, 2014

PCManticore mannequin commented Apr 24, 2014

python-dev mannequin commented Apr 26, 2014

serhiy-storchaka commented May 13, 2014

jimjjewett mannequin commented May 13, 2014

PCManticore mannequin commented May 13, 2014

PCManticore mannequin commented May 13, 2014

serhiy-storchaka commented May 25, 2014

PCManticore mannequin commented May 28, 2014

python-dev mannequin commented May 28, 2014

serhiy-storchaka commented May 28, 2014

dbm.dumb should be consistent when the database is closed #63584

dbm.dumb should be consistent when the database is closed #63584

Comments

PCManticore mannequin commented Oct 25, 2013

PCManticore mannequin commented Oct 25, 2013

ncoghlan commented Oct 31, 2013

serhiy-storchaka commented Oct 31, 2013

serhiy-storchaka commented Oct 31, 2013

PCManticore mannequin commented Nov 1, 2013

PCManticore mannequin commented Dec 13, 2013

serhiy-storchaka commented Dec 13, 2013

PCManticore mannequin commented Dec 13, 2013

Arfrever mannequin commented Dec 13, 2013

serhiy-storchaka commented Dec 13, 2013

ncoghlan commented Dec 14, 2013

Arfrever mannequin commented Dec 14, 2013

ncoghlan commented Dec 14, 2013

PCManticore mannequin commented Dec 17, 2013

PCManticore mannequin commented Mar 10, 2014

PCManticore mannequin commented Mar 17, 2014

jimjjewett mannequin commented Apr 23, 2014

PCManticore mannequin commented Apr 23, 2014

jimjjewett mannequin commented Apr 23, 2014

PCManticore mannequin commented Apr 24, 2014

python-dev mannequin commented Apr 26, 2014

serhiy-storchaka commented May 13, 2014

jimjjewett mannequin commented May 13, 2014

PCManticore mannequin commented May 13, 2014

PCManticore mannequin commented May 13, 2014

serhiy-storchaka commented May 25, 2014

PCManticore mannequin commented May 28, 2014

python-dev mannequin commented May 28, 2014

serhiy-storchaka commented May 28, 2014