This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shelve.open/bsddb.hashopen exception with unicode paths
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jcea, r.david.murray, serhiy.storchaka, vstinner, wjm251
Priority: normal Keywords: patch

Created on 2010-07-28 09:10 by wjm251, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
dbm_open_unicode-27.patch vstinner, 2010-07-28 12:42 review
dbm_open_unicode-32.patch vstinner, 2010-07-28 13:46 review
bsddb_unicode_filename-27.patch vstinner, 2010-07-28 14:47 review
Messages (14)
msg111779 - (view) Author: wjm251 (wjm251) Date: 2010-07-28 09:10
Windows XP Simple Chinese Version
in python2.5,Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
I have a directory "D:\你好新建文件夹"
my code is as follows: 
#--------------------------------------
temppath = u"D:\\你好新建文件夹\\a"
import shelve
cache = shelve.open(temppath, 'c')
#--------------------------------------

when use temppath.encode("utf-8"),it works,
but in python2.6,temppath can works properly

but I got a Error with such traceback

Traceback (most recent call last):
  File "D:\eclipse_workspace\pytest\src\test.py", line 5, in <module>
    cache = shelve.open(temppath, 'c')
  File "D:\eclipse_workspace\omstarv5r6\linksvn\src\UNPPython\pywindows\Lib\shelve.py", line 225, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "D:\eclipse_workspace\omstarv5r6\linksvn\src\UNPPython\pywindows\Lib\shelve.py", line 209, in __init__
    Shelf.__init__(self, anydbm.open(filename, flag), protocol, writeback)
  File "D:\eclipse_workspace\omstarv5r6\linksvn\src\UNPPython\pywindows\Lib\anydbm.py", line 83, in open
    return mod.open(file, flag, mode)
  File "D:\eclipse_workspace\omstarv5r6\linksvn\src\UNPPython\pywindows\Lib\dbhash.py", line 16, in open
    return bsddb.hashopen(file, flag, mode)
  File "D:\eclipse_workspace\omstarv5r6\linksvn\src\UNPPython\pywindows\Lib\bsddb\__init__.py", line 310, in hashopen
    d.open(file, db.DB_HASH, flags, mode)
bsddb.db.DBNoSuchFileError: (2, 'No such file or directory')
msg111795 - (view) Author: wjm251 (wjm251) Date: 2010-07-28 12:26
I think it is supposed that unicode paths and GBK encoded str objects will be ok in Windows.
But only UTF-8 encoded str can
msg111797 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 12:41
shelve uses anydbm which uses gdbm, dbm or bsddbm. Attached patch fixes gdbm and dbm (replace "s" format by "et" with Py_FileSystemDefaultEncoding).

bsddbm is harder to fix: bsdhashopen(), bsdbtopen() and bsdrnopen() have to be fixed, and they accept None for the filename ("z" format).
msg111801 - (view) Author: wjm251 (wjm251) Date: 2010-07-28 13:11
sorry I donot know exactly your meaning,
what does these  mean: "s" format by "et " , "z" format
and I'm not familiar with the C/C++

do you mean that I can use the attached patch to complie a new Python dll? 
but it seams that in My PC the shelve module always uses bsddbm automaticly, 

can you explains more clearly?

thank you very much
sincerely
msg111809 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 13:46
Same patch for Python 3.2.
msg111812 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-28 14:07
Victor's comments were addressed to the python development community and concern python internals.  Given that only bsddb exists on windows by default, his patches unfortunately don't do you any good.  I'm adding jcea as nosy in case he wants to/can deal with the problem in bsddb.
msg111814 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 14:10
It looks like bsddb (dbm.bsd) module doesn't exist anymore in Python3: see issue #9397. It's now maintained in the third party module pybsddb.
msg111817 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 14:47
New patch for bsddb module: create parse_filename() function, based on Python3 PyUnicode_FSConverter() but it accepts None. I didn't tested the patch because I'm unable to compile the module. It looks like it should use db_185.h instead of db.h, and link to another library, but configure  or setup.py doesn't know it.
msg227673 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-27 14:03
dbm_open_unicode-32.patch no longer applied cleanly due to Argument Clinic.

I'm not sure about applying patches to 2.7. I support this, but it looks as new feature, and you should ask on Python-Dev mailing list.
msg234236 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-18 10:16
Could you please update the patch Victor?
msg234247 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-18 14:31
> Could you please update the patch Victor?

You can update this old patch if you want.
msg234249 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-18 14:33
Python 3 is not affected:

Python 3.5.0a0 (default:61a045ac0006, Jan 15 2015, 00:05:43) 
[GCC 4.9.2 20141101 (Red Hat 4.9.2-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> temppath = u"D:\\你好新建文件夹\\a"
>>> import shelve
>>> cache = shelve.open(temppath, 'c')

(no error)
msg234250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-18 14:36
> Python 3 is not affected:

Oh sorry, dbm_open_unicode-32.patch is still needed. Currently, filenames are encoded to UTF-8 which "works" when the filesystem encoding is UTF-8, but it doesn't work on Windows.
msg234260 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-18 17:03
And needed tests.
History
Date User Action Args
2022-04-11 14:57:04adminsetgithub: 53639
2015-01-18 17:03:35serhiy.storchakasetmessages: + msg234260
2015-01-18 14:36:09vstinnersetmessages: + msg234250
versions: + Python 3.4, Python 3.5
2015-01-18 14:33:01vstinnersetmessages: + msg234249
versions: - Python 3.4, Python 3.5
2015-01-18 14:31:42vstinnersetmessages: + msg234247
2015-01-18 10:16:58serhiy.storchakasetmessages: + msg234236
2014-12-13 19:14:52serhiy.storchakasetstage: test needed -> needs patch
versions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2
2014-09-27 14:03:21serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227673
2010-07-28 14:47:56vstinnersetfiles: + bsddb_unicode_filename-27.patch

messages: + msg111817
2010-07-28 14:10:41vstinnersetmessages: + msg111814
2010-07-28 14:07:42r.david.murraysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
nosy: + jcea, r.david.murray

messages: + msg111812

stage: test needed
2010-07-28 13:46:06vstinnersetfiles: + dbm_open_unicode-32.patch

messages: + msg111809
2010-07-28 13:11:22wjm251setmessages: + msg111801
2010-07-28 12:42:30vstinnersetfiles: + dbm_open_unicode-27.patch
keywords: + patch
2010-07-28 12:41:52vstinnersetmessages: + msg111797
2010-07-28 12:26:10wjm251setmessages: + msg111795
2010-07-28 09:30:16eric.araujosetnosy: + vstinner

type: behavior
title: shelve.open/bsddb.hashopen raise Exception'No such file or directory'for "Chinese path" -> shelve.open/bsddb.hashopen exception with unicode paths
2010-07-28 09:28:42eric.araujolinkissue9394 superseder
2010-07-28 09:10:41wjm251create