classification
Title: dbm: Can't open database with bytes-encoded filename
Type: Stage: resolved
Components: Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: jgoerzen, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-11-20 15:07 by jgoerzen, last changed 2019-11-23 07:28 by serhiy.storchaka. This issue is now closed.

Messages (3)
msg357078 - (view) Author: John Goerzen (jgoerzen) Date: 2019-11-20 15:07
This simple recipe fails:

>>> import dbm
>>> dbm.open(b"foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/dbm/__init__.py", line 78, in open
    result = whichdb(file) if 'n' not in flag else None
  File "/usr/lib/python3.7/dbm/__init__.py", line 112, in whichdb
    f = io.open(filename + ".pag", "rb")
TypeError: can't concat str to bytes

Why does this matter?  On POSIX, a filename is any string of bytes that does not contain 0x00 or '/'.  A database with a filename containing, for instance, German characters in ISO-8859-1, can't be opened by dbm, EVEN WITH decoding.

For instance:

file = b"test\xf7"
>>> dbm.open(file.decode())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 4: invalid start byte
db = dbm.open(file.decode('iso-8859-1'), 'c')
db.close()

Then:

ls *.db | hd
00000000  74 65 73 74 c3 b7 2e 64  62 0a                    |test...db.|
0000000a

Note that it didn't insert the 0xf7 here; rather, it inserted the Unicode sequence corresponding to the division character (which is what 0xf7 in iso-8859-1 is).  It is not possible to open a filename named "test\xf7.db" with the dbm module.
msg357104 - (view) Author: John Goerzen (jgoerzen) Date: 2019-11-20 21:07
As has been pointed out to me, the surrogateescape method could be used here; however, it is a bit of an odd duckling itself, and the system's open() call accepts bytes; couldn't this as well?
msg357362 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-11-23 07:28
Low level functions inthe os module support both str and bytes paths (they support also path-like objects and often support open file descriptors). But high level functions support only str and maybe path-like objects. Use os.fsdecode if you need to convert  bytes path to str.
History
Date User Action Args
2019-11-23 07:28:35serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg357362

resolution: not a bug
stage: resolved
2019-11-20 21:07:01jgoerzensetmessages: + msg357104
2019-11-20 15:07:49jgoerzencreate