New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte/string inconsistencies between different dbm modules #48049
Comments
Consider these two timeit commands: py3k% python3.0 -m timeit -s 'import dbm.ndbm as db' -s 'f =
db.open("/tmp/trash.db", "c")' 'for i in range(1000): f[str(i)] = str(i)'
100 loops, best of 3: 5.51 msec per loop
py3k% python3.0 -m timeit -s 'import dbm.dumb as db' -s 'f =
db.open("/tmp/trash.db", "c")' 'for i in range(1000): f[str(i)] = str(i)'
Traceback (most recent call last):
File "/Users/skip/local/lib/python3.0/timeit.py", line 297, in main
x = t.timeit(number)
File "/Users/skip/local/lib/python3.0/timeit.py", line 193, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 7, in inner
for i in range(1000): f[str(i)] = str(i)
File "/Users/skip/local/lib/python3.0/dbm/dumb.py", line 165, in
__setitem__
raise TypeError("keys must be bytes")
TypeError: keys must be bytes Seems to me they should either both succeed or both fail. What are keys Marking it as high priority. When 3.0 is released all these modules |
Making this into a release blocker just so someone will look at it. |
Extra data point. I tried f["1"] = "a" and f[b"1"] = "a" with dbm.{gnu,ndbm,dumb,sqlite}. All worked with bytes. A except |
How hard would it be to fix dbm.dumb to accept strings as well? |
I'm not sure. I've never done anything with the io module. Simply File "/Users/skip/local/lib/python3.0/dbm/dumb.py", line 170, in I suppose you'd have to check if val is an instance of str and if so, That said, I've attached a patch which passes all current unit tests. Skip |
I think this isn't quite right. Ideally a fix should maintain several important properties: (1) Be able to read databases written by Python 2.x. (1a) Write databases readable by Python 2.x. (2) Use the same mapping between str and bytes as the other *dbm (2a) Return the same value for keys() as the other *dbm libraries I think (2) means that we should use UTF-8 to convert str keys to bytes, PS. I noticed the dbm module still returns bytearrays for keys and |
For information, Python3 trunk fails on:
test.support.TestFailed: Traceback (most recent call last):
File "Lib/test/test_dbm.py", line 157, in test_keys
self.assert_('xxx' not in self.d)
TypeError: gdbm key must be bytes, not str |
The patch causes three errors: ====================================================================== Traceback (most recent call last):
File "/home/heimes/dev/python/py3k/Lib/test/test_dbm.py", line 92, in
test_anydbm_access
f = dbm.open(_fname, 'r')
File "/home/heimes/dev/python/py3k/Lib/dbm/__init__.py", line 79, in open
raise error[0]("need 'c' or 'n' flag to open new db")
dbm.error: need 'c' or 'n' flag to open new db ====================================================================== Traceback (most recent call last):
File "/home/heimes/dev/python/py3k/Lib/test/test_dbm.py", line 86, in
test_anydbm_keys
f = dbm.open(_fname, 'r')
File "/home/heimes/dev/python/py3k/Lib/dbm/__init__.py", line 79, in open
raise error[0]("need 'c' or 'n' flag to open new db")
dbm.error: need 'c' or 'n' flag to open new db ====================================================================== Traceback (most recent call last):
File "/home/heimes/dev/python/py3k/Lib/test/test_dbm.py", line 80, in
test_anydbm_read
f = dbm.open(_fname, 'r')
File "/home/heimes/dev/python/py3k/Lib/dbm/__init__.py", line 79, in open
raise error[0]("need 'c' or 'n' flag to open new db")
dbm.error: need 'c' or 'n' flag to open new db Ran 16 tests in 0.429s FAILED (errors=3) |
I don't see enough progress on this issue, and I'm not going to hold up |
If you look at the 2.7 code all it requires of keys and values in Thus I think going down the UTF-8 route is the right thing to do for |
OK, now I see why it is called 'dumb'; the thing literally just dumps |
I have attached a file that does everything internally as UTF-8 but |
r67310 has the fix. |
I am re-opening this as a deferred blocker with a patch to document that |
Have another patch that fixes all open() calls to specify the file |
damn... my cc to report@bugs.python.org didn't work. Here's the recap
should And write them. From msg72963:
Ah, but wait a minute. I see your comment in msg76080:
in The acid test. I executed the attached mydb2write.py using Python 2.5 % python2.5 mydb2write.py
1 abc
2 [4, {4.2999999999999998: 12}]
3 <__main__.C instance at 0x34bb70>
% python3.0 mydb3read.py
1 b'abc'
2 [4, {4.2999999999999998: 12}]
Traceback (most recent call last):
File "mydb3read.py", line 13, in <module>
print(3, pickle.loads(db['3']))
File "/Users/skip/local/lib/python3.0/pickle.py", line 1329, in
loads
return Unpickler(file, encoding=encoding, errors=errors).load()
_pickle.UnpicklingError: bad pickle data so if the ability to read Python 2.x dumbdbm files is still a |
I think the ability to read old files is essential. The ability to On Fri, Nov 21, 2008 at 7:36 AM, <skip@pobox.com> wrote:
>
> me> ... I thought Guido was of the opinion that the 3.0 version should
> me> be able to read dumb dbms written by earlier Python versions....
>
> And write them. From msg72963:
>
> (1) Be able to read databases written by Python 2.x.
>
> (1a) Write databases readable by Python 2.x.
>
> Ah, but wait a minute. I see your comment in msg76080:
>
> If you look at the 2.7 code all it requires of keys and values in
> __setitem__ is that they are strings; there is nothing about Latin-1 in
> terms of specific encoding (must be a 3.0 addition to make the
> str/unicode transition the easiest).
>
> The acid test. I executed the attached mydb2write.py using Python 2.5 then
> executed the attached mydb3read.py using Python 3.0. The output:
>
> % python2.5 mydb2write.py
> 1 abc
> 2 [4, {4.2999999999999998: 12}]
> 3 <__main__.C instance at 0x34bb70>
> % python3.0 mydb3read.py
> 1 b'abc'
> 2 [4, {4.2999999999999998: 12}]
> Traceback (most recent call last):
> File "mydb3read.py", line 13, in <module>
> print(3, pickle.loads(db['3']))
> File "/Users/skip/local/lib/python3.0/pickle.py", line 1329, in loads
> return Unpickler(file, encoding=encoding, errors=errors).load()
> _pickle.UnpicklingError: bad pickle data
>
> so if the ability to read Python 2.x dumbdbm files is still a requirement I
> think there's a little more work to do.
>
> cc'ing report@bugs.python.org to preserve the scripts with the ticket.
>
> Skip
>
>
> _______________________________________________
> Python-3000-checkins mailing list
> Python-3000-checkins@python.org
> http://mail.python.org/mailman/listinfo/python-3000-checkins
>
> |
Reading old dumbdbm files is essential. Writing them is not. |
[fix title] |
So the use of pickle is not fair as that doesn't round-trip if you Now if you skip that one use-case in the example of pickling a In other words I think my solution works and pickle is the trouble-maker |
Brett> In other words I think my solution works and pickle is the I can buy that. Should pickle map "copy_reg" to "copyreg"? Is that the Actually, it seems this ticket should be closed and another opened about Skip |
On Fri, Nov 21, 2008 at 10:32, Skip Montanaro <report@bugs.python.org> wrote:
Well, I still need a code review for my latest patch that changes the |
One doc nit: There is still reference to I'm confused by the encoding="Latin-1" args to _io.open for dbm.dumb. I Skip |
On Fri, Nov 21, 2008 at 11:01, Skip Montanaro report@bugs.python.org wrote:
OK, I will fix that and upload a new patch at some point.
It's so that when writing out there won't be any errors. Since the |
py3k patched with specify_open_encoding.diff passes test_dbm_dumb on my Skip |
specify_open_encoding.diff has been committed in r67369. I still need a review for doc_dbm_strings.diff, though, which clarifies |
Brett> I still need a review for doc_dbm_strings.diff, though, which Was my comment
and your resonse
not sufficient? If not, allow me to anoint said diff with virtual holy water now... Skip |
On Mon, Nov 24, 2008 at 16:25, Skip Montanaro <report@bugs.python.org> wrote:
Wasn't sure if that meant everything not mentioned was fine.
=) Thanks! |
r67380 has the fix. Thanks for the review, Skip! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: