msg201209 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-10-25 06:38 |
This problem occurred in issue19282. Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following:
>>> import dbm.dumb as d
>>> db = d.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys
return list(self._index.keys())
AttributeError: 'NoneType' object has no attribute 'keys'
>>>
vs
>>> import dbm.gnu as g
>>> db = g.open('test.dat', 'c')
>>> db.close()
>>> db.keys()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_gdbm.error: GDBM object has already been closed
>>>
Attaching a patch.
|
msg201819 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-31 14:36 |
Patch looks good to me, I'll apply it when I apply issue 19282.
|
msg201823 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-31 15:13 |
Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception.
|
msg201824 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-31 15:16 |
Yet one nitpick. I think that closing check should be after argument type check and key encoding.
|
msg201900 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-11-01 14:06 |
Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, `items` will probably look like this:
try:
return [(key, self[key]) for key in self._index.keys()]
except AttributeError:
raise dbm.dumb.error(...) from None
but will look totally different for other __len__:
try:
return len(self._index)
except TypeError:
raise dbm.dumb.error(...) from None.
We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access.
|
msg206050 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-13 10:34 |
Serhiy, what's the status for this? Should this be targeted for 3.5?
|
msg206058 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-12-13 11:09 |
> Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better).
Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results?
The patch LGTM.
> Should this be targeted for 3.5?
I think yes.
|
msg206069 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-13 12:40 |
Here's a benchmark:
- with current patch:
# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
10000 loops, best of 3: 1.78 usec per loop
- using
try:
return len(self._index)
except TypeError:
raise error('...')
# ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass'
10000 loops, best of 3: 3.27 usec per loop
Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical.
|
msg206094 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-12-13 15:07 |
IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.
|
msg206151 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-12-13 21:53 |
> Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None.
Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database.
> IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.
Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick?
|
msg206183 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-12-14 13:28 |
I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice.
|
msg206184 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-12-14 13:35 |
> in the last ever 3.3 release.
msg204533 suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0).
|
msg206185 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-12-14 13:51 |
That's slightly more acceptable, but it's still hard to make the case
that changing exception types is a reasonable thing to do in a
maintenance release (it's only the specific nature of the error that
makes me consider it reasonable even in a feature release).
|
msg206416 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-17 10:43 |
If we agree that this should be fixed in 3.4, can somebody commit it before the second beta?
|
msg213069 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-03-10 18:58 |
Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d).
|
msg213840 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-03-17 07:15 |
Can this patch be committed, now that 3.5 is active?
|
msg217091 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-04-23 21:56 |
_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open.
|
msg217093 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-04-23 22:01 |
gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch.
|
msg217097 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-04-23 22:38 |
I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before?
Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb.
|
msg217116 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-04-24 05:11 |
On my machine I get the following results for the unclosed-database case.
With patch:
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0638 usec per loop
Without patch:
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)"
100000 loops, best of 3: 0.0634 usec per loop
|
msg217216 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-04-26 20:57 |
New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4':
make operations on closed dumb databases raise a consistent exception (closes #19385)
http://hg.python.org/cpython/rev/e8343cb98cc3
New changeset dbceba88b96e by Benjamin Peterson in branch 'default':
merge 3.4 (#19385)
http://hg.python.org/cpython/rev/dbceba88b96e
|
msg218435 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-13 11:23 |
Claudiu, your benchmark is broken, it measure a no-op.
Correct benchmark:
$ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
3.4: 100000 loops, best of 3: 2.56 usec per loop
3.5: 100000 loops, best of 3: 4.17 usec per loop
There is 60% regression.
|
msg218440 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-05-13 11:39 |
Did you try 3.5 without the patch, in case the issue is not with his code?
On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote:
>
> Serhiy Storchaka added the comment:
>
> Claudiu, your benchmark is broken, it measure a no-op.
>
> Correct benchmark:
>
> $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm;
> d=dbm.open('x.dat', 'c')" "len(d)"
>
> 3.4: 100000 loops, best of 3: 2.56 usec per loop
> 3.5: 100000 loops, best of 3: 4.17 usec per loop
>
> There is 60% regression.
>
> ----------
> status: closed -> open
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue19385>
> _______________________________________
>
|
msg218441 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-13 11:42 |
Right, my benchmark was indeed flawed. Here are the new results on my machine:
Without the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.564 usec per loop
With the patch
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.857 usec per loop
Even having an empty _verify_open in __len__ method leads to this:
# ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)"
100000 loops, best of 3: 0.749 usec per loop
|
msg218444 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-13 12:10 |
Here's a new patch which uses the EAFP approach for dunder methods (__len__, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch.
|
msg219083 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-25 11:28 |
There is no need to speed up methods which do IO (__getitem__, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases.
|
msg219277 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-28 15:27 |
Looks good to me.
|
msg219281 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-05-28 15:51 |
New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4':
Restore performance of some dumb database methods (regression introduced by #19385).
http://hg.python.org/cpython/rev/95207bcd8298
New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default':
Restore performance of some dumb database methods (regression introduced by #19385).
http://hg.python.org/cpython/rev/2e59e0b579e5
|
msg219283 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-28 15:53 |
Thanks Claudiu.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:52 | admin | set | github: 63584 |
2014-05-28 15:53:06 | serhiy.storchaka | set | status: open -> closed
messages:
+ msg219283 stage: patch review -> resolved |
2014-05-28 15:51:02 | python-dev | set | messages:
+ msg219281 |
2014-05-28 15:27:46 | Claudiu.Popa | set | messages:
+ msg219277 |
2014-05-25 11:28:53 | serhiy.storchaka | set | files:
+ issue19385_speed_2.patch
stage: resolved -> patch review messages:
+ msg219083 versions:
+ Python 3.4 |
2014-05-13 12:10:23 | Claudiu.Popa | set | files:
+ issue19385_speed.patch
messages:
+ msg218444 |
2014-05-13 11:42:39 | Claudiu.Popa | set | messages:
+ msg218441 |
2014-05-13 11:39:25 | Jim.Jewett | set | messages:
+ msg218440 |
2014-05-13 11:23:09 | serhiy.storchaka | set | status: closed -> open
messages:
+ msg218435 |
2014-04-26 20:57:20 | python-dev | set | status: open -> closed
nosy:
+ python-dev messages:
+ msg217216
resolution: fixed stage: commit review -> resolved |
2014-04-24 05:11:39 | Claudiu.Popa | set | files:
+ issue19385_1.patch
messages:
+ msg217116 |
2014-04-23 22:38:07 | Jim.Jewett | set | messages:
+ msg217097 |
2014-04-23 22:01:17 | Claudiu.Popa | set | messages:
+ msg217093 |
2014-04-23 21:56:52 | Jim.Jewett | set | nosy:
+ Jim.Jewett messages:
+ msg217091
|
2014-03-19 20:02:38 | zach.ware | set | versions:
+ Python 3.5, - Python 3.4 |
2014-03-17 07:15:49 | Claudiu.Popa | set | messages:
+ msg213840 |
2014-03-10 18:58:38 | Claudiu.Popa | set | files:
+ issue19385.patch
messages:
+ msg213069 |
2013-12-17 10:43:23 | Claudiu.Popa | set | messages:
+ msg206416 |
2013-12-14 13:51:36 | ncoghlan | set | messages:
+ msg206185 |
2013-12-14 13:35:26 | Arfrever | set | messages:
+ msg206184 |
2013-12-14 13:28:35 | ncoghlan | set | messages:
+ msg206183 versions:
+ Python 3.4, - Python 3.5 |
2013-12-13 21:53:26 | serhiy.storchaka | set | messages:
+ msg206151 |
2013-12-13 15:07:04 | Arfrever | set | messages:
+ msg206094 |
2013-12-13 12:40:41 | Claudiu.Popa | set | messages:
+ msg206069 |
2013-12-13 11:09:14 | serhiy.storchaka | set | stage: commit review messages:
+ msg206058 versions:
+ Python 3.5, - Python 3.4 |
2013-12-13 10:34:28 | Claudiu.Popa | set | messages:
+ msg206050 |
2013-11-01 21:03:28 | Claudiu.Popa | set | files:
+ dbm_dumb1.patch |
2013-11-01 14:06:23 | Claudiu.Popa | set | files:
+ dbm_dumb1.patch
messages:
+ msg201900 |
2013-10-31 15:16:24 | serhiy.storchaka | set | messages:
+ msg201824 |
2013-10-31 15:13:47 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg201823
|
2013-10-31 14:36:39 | ncoghlan | set | assignee: ncoghlan
messages:
+ msg201819 nosy:
+ ncoghlan |
2013-10-28 22:54:52 | Arfrever | set | nosy:
+ Arfrever
|
2013-10-25 06:40:09 | Claudiu.Popa | set | files:
+ dbm_dumb.patch keywords:
+ patch |
2013-10-25 06:38:41 | Claudiu.Popa | create | |