This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dbm.dumb should be consistent when the database is closed
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: Arfrever, Claudiu.Popa, Jim.Jewett, ncoghlan, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-10-25 06:38 by Claudiu.Popa, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dbm_dumb.patch Claudiu.Popa, 2013-10-25 06:40 review
dbm_dumb1.patch Claudiu.Popa, 2013-11-01 14:06 review
dbm_dumb1.patch Claudiu.Popa, 2013-11-01 21:03 add space in test review
issue19385.patch Claudiu.Popa, 2014-03-10 18:58 review
issue19385_1.patch Claudiu.Popa, 2014-04-24 05:11 review
issue19385_speed.patch Claudiu.Popa, 2014-05-13 12:10 review
issue19385_speed_2.patch serhiy.storchaka, 2014-05-25 11:28 review
Messages (29)
msg201209 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-10-25 06:38
This problem occurred in issue19282. Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following:


>>> import dbm.dumb as d
>>> db = d.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys
    return list(self._index.keys())
AttributeError: 'NoneType' object has no attribute 'keys'
>>>

vs

>>> import dbm.gnu as g
>>> db = g.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_gdbm.error: GDBM object has already been closed
>>>

Attaching a patch.
msg201819 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-10-31 14:36
Patch looks good to me, I'll apply it when I apply issue 19282.
msg201823 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-31 15:13
Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception.
msg201824 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-31 15:16
Yet one nitpick. I think that closing check should be after argument type check and key encoding.
msg201900 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-11-01 14:06
Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, `items` will probably look like this:

try:         
   return [(key, self[key]) for key in self._index.keys()]
except AttributeError:
   raise dbm.dumb.error(...) from None

but will look totally different for other __len__:

try:
   return len(self._index)
except TypeError:
   raise dbm.dumb.error(...) from None.


We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access.
msg206050 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-12-13 10:34
Serhiy, what's the status for this? Should this be targeted for 3.5?
msg206058 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-13 11:09
> Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better).

Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results?

The patch LGTM.

> Should this be targeted for 3.5?

I think yes.
msg206069 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-12-13 12:40
Here's a benchmark:

- with current patch:


# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' '  len(d)' 'except OSError:' '  pass'
10000 loops, best of 3: 1.78 usec per loop


- using 
 try:
     return len(self._index)
 except TypeError:
     raise error('...')


# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' '  len(d)' 'except OSError:' '  pass'
10000 loops, best of 3: 3.27 usec per loop


Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical.
msg206094 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-12-13 15:07
IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.
msg206151 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-13 21:53
> Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None.

Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database.

> IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.

Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick?
msg206183 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-14 13:28
I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice.
msg206184 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-12-14 13:35
> in the last ever 3.3 release.

msg204533 suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0).
msg206185 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-14 13:51
That's slightly more acceptable, but it's still hard to make the case
that changing exception types is a reasonable thing to do in a
maintenance release (it's only the specific nature of the error that
makes me consider it reasonable even in a feature release).
msg206416 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-12-17 10:43
If we agree that this should be fixed in 3.4, can somebody commit it before the second beta?
msg213069 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-10 18:58
Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d).
msg213840 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-17 07:15
Can this patch be committed, now that 3.5 is active?
msg217091 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-23 21:56
_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail.  Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open.
msg217093 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-23 22:01
gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch.
msg217097 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-23 22:38
I think the requested timing regression was for the non-broken case.  When the database has NOT been closed, and keys() still works, will it be way slower than before?

Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb.
msg217116 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-24 05:11
On my machine I get the following results for the unclosed-database case.

With patch:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0638 usec per loop

Without patch:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0634 usec per loop
msg217216 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-26 20:57
New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4':
make operations on closed dumb databases raise a consistent exception (closes #19385)
http://hg.python.org/cpython/rev/e8343cb98cc3

New changeset dbceba88b96e by Benjamin Peterson in branch 'default':
merge 3.4 (#19385)
http://hg.python.org/cpython/rev/dbceba88b96e
msg218435 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-13 11:23
Claudiu, your benchmark is broken, it measure a no-op.

Correct benchmark:

    $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"

3.4:  100000 loops, best of 3: 2.56 usec per loop
3.5:  100000 loops, best of 3: 4.17 usec per loop

There is 60% regression.
msg218440 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-05-13 11:39
Did you try 3.5 without the patch, in case the issue is not with his code?
On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote:

>
> Serhiy Storchaka added the comment:
>
> Claudiu, your benchmark is broken, it measure a no-op.
>
> Correct benchmark:
>
>     $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm;
> d=dbm.open('x.dat', 'c')"  "len(d)"
>
> 3.4:  100000 loops, best of 3: 2.56 usec per loop
> 3.5:  100000 loops, best of 3: 4.17 usec per loop
>
> There is 60% regression.
>
> ----------
> status: closed -> open
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue19385>
> _______________________________________
>
msg218441 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-05-13 11:42
Right, my benchmark was indeed flawed. Here are the new results on my machine:

Without the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"
100000 loops, best of 3: 0.564 usec per loop

With the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"
100000 loops, best of 3: 0.857 usec per loop

Even having an empty _verify_open in __len__ method leads to this:

# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')"  "len(d)"
100000 loops, best of 3: 0.749 usec per loop
msg218444 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-05-13 12:10
Here's a new patch which uses the EAFP approach for dunder methods (__len__, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch.
msg219083 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-25 11:28
There is no need to speed up methods which do IO (__getitem__, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases.
msg219277 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-05-28 15:27
Looks good to me.
msg219281 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-05-28 15:51
New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4':
Restore performance of some dumb database methods (regression introduced by #19385).
http://hg.python.org/cpython/rev/95207bcd8298

New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default':
Restore performance of some dumb database methods (regression introduced by #19385).
http://hg.python.org/cpython/rev/2e59e0b579e5
msg219283 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-28 15:53
Thanks Claudiu.
History
Date User Action Args
2022-04-11 14:57:52adminsetgithub: 63584
2014-05-28 15:53:06serhiy.storchakasetstatus: open -> closed

messages: + msg219283
stage: patch review -> resolved
2014-05-28 15:51:02python-devsetmessages: + msg219281
2014-05-28 15:27:46Claudiu.Popasetmessages: + msg219277
2014-05-25 11:28:53serhiy.storchakasetfiles: + issue19385_speed_2.patch

stage: resolved -> patch review
messages: + msg219083
versions: + Python 3.4
2014-05-13 12:10:23Claudiu.Popasetfiles: + issue19385_speed.patch

messages: + msg218444
2014-05-13 11:42:39Claudiu.Popasetmessages: + msg218441
2014-05-13 11:39:25Jim.Jewettsetmessages: + msg218440
2014-05-13 11:23:09serhiy.storchakasetstatus: closed -> open

messages: + msg218435
2014-04-26 20:57:20python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg217216

resolution: fixed
stage: commit review -> resolved
2014-04-24 05:11:39Claudiu.Popasetfiles: + issue19385_1.patch

messages: + msg217116
2014-04-23 22:38:07Jim.Jewettsetmessages: + msg217097
2014-04-23 22:01:17Claudiu.Popasetmessages: + msg217093
2014-04-23 21:56:52Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg217091
2014-03-19 20:02:38zach.waresetversions: + Python 3.5, - Python 3.4
2014-03-17 07:15:49Claudiu.Popasetmessages: + msg213840
2014-03-10 18:58:38Claudiu.Popasetfiles: + issue19385.patch

messages: + msg213069
2013-12-17 10:43:23Claudiu.Popasetmessages: + msg206416
2013-12-14 13:51:36ncoghlansetmessages: + msg206185
2013-12-14 13:35:26Arfreversetmessages: + msg206184
2013-12-14 13:28:35ncoghlansetmessages: + msg206183
versions: + Python 3.4, - Python 3.5
2013-12-13 21:53:26serhiy.storchakasetmessages: + msg206151
2013-12-13 15:07:04Arfreversetmessages: + msg206094
2013-12-13 12:40:41Claudiu.Popasetmessages: + msg206069
2013-12-13 11:09:14serhiy.storchakasetstage: commit review
messages: + msg206058
versions: + Python 3.5, - Python 3.4
2013-12-13 10:34:28Claudiu.Popasetmessages: + msg206050
2013-11-01 21:03:28Claudiu.Popasetfiles: + dbm_dumb1.patch
2013-11-01 14:06:23Claudiu.Popasetfiles: + dbm_dumb1.patch

messages: + msg201900
2013-10-31 15:16:24serhiy.storchakasetmessages: + msg201824
2013-10-31 15:13:47serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg201823
2013-10-31 14:36:39ncoghlansetassignee: ncoghlan

messages: + msg201819
nosy: + ncoghlan
2013-10-28 22:54:52Arfreversetnosy: + Arfrever
2013-10-25 06:40:09Claudiu.Popasetfiles: + dbm_dumb.patch
keywords: + patch
2013-10-25 06:38:41Claudiu.Popacreate