Issue 410465: Allow pre-encoded strings as filenames

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/34213

classification

Title:	Allow pre-encoded strings as filenames
Type:		Stage:
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	mhammond	Nosy List:	gvanrossum, lemburg, mhammond
Priority:	normal	Keywords:	patch

Created on 2001-03-22 05:02 by mhammond, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python-patch	mhammond, 2001-03-22 05:04	The whole patch

Messages (7)
msg36172 - (view)	Author: Mark Hammond (mhammond) *	Date: 2001-03-22 05:02
This patch enables most filename parameters to use pre- encoded strings. On Windows, the default of "mbcs" is used. On all other platforms, the default filename encoding is the same as the general default encoding, which in reality means there is no functional change. However, other platforms can simply plugin their own encodings. Rationalle: os.listdir() etc already return pre- encoded strings on some platforms (notably Windows). These pre-encoded strings may be used now for all these functions - however, if you convert this encoded string to a Unicode string, it can not be used to open the file. This patch enables either a pre-encoded string to work (as now) or a Unicode representation of that same string (unlike now) Things of note: * I invented a new "Es" PyArg_ParseTuple marker. This is very similar to "es", except it leaves string objects alone assuming they are already encoded correctly. "es" assumes a string in the default encoding which it will then encode in the new characterset - ie, a pre-encoded string fails here. * This means that all affected functions have an extra string copy. This copy still happens even when strings are passed, and even on platforms where no Unicode filesystem support exists. The only other alternative was to make a much uglier patch, somehow using string objects in-place, but converting and freeing the buffer when Unicode. This could be done if desired, but I'm not sure the added code complexity is worth it. * New method on win32: nt._getpathname(). This is almost identical to win32api.GetPathName(), except it handles encoded strings. ntpath.py has also been changed to work with this. A hidden bonus of this patch is that it will make os.abspath() work identically regardless of the Win32 extensions being installed. * Tested on Linux, Windows 98 and Windows 2k. Still working out how to build Python on my BeOs box :) * New test for these semantics added.
msg36173 - (view)	Author: Mark Hammond (mhammond) *	Date: 2001-03-22 05:04
Logged In: YES user_id=14198 doh - forgot to click the checkbox
msg36174 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2001-03-22 21:45
Logged In: YES user_id=6380 Mark, I don't think you expected to get this into 2.1, did you? It's way too big. Also, I think your patch to posixmodule.c has some bugs -- if I understand correctly, the format string "Es" requires two arguments, the encoding and the address of the C string pointer; but several functions (posix_rename and onwards) don't pass the encoding name.
msg36175 - (view)	Author: Mark Hammond (mhammond) *	Date: 2001-03-22 22:10
Logged In: YES user_id=14198 I appreciate it is too late for 2.1 for a change of this size. I don't think posixmodule is wrong - at least not how you think :) posix_rename calls: return posix_2str(args, "EsEs:rename", rename); however, it is posix_2str that passes the encoding, not posix_rename itself. Ditto for posix_1str and posix_do_stat.
msg36176 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2001-04-27 07:54
Logged In: YES user_id=38388 I like the idea of telling the arg parser to accept strings as-is, but I think that copying all the code just to implement the new "E" parser. Much easier would be switching on the second marker (behind the "e"), e.g. using "et" and "et#". Do you want me to look into this ?
msg36177 - (view)	Author: Mark Hammond (mhammond) *	Date: 2001-04-27 12:15
Logged In: YES user_id=14198 MAL - please do! I generally look for the least-intrusive patch when dealing with potentially contentious issues, but I agree it makes more sense to rationalize.
msg36178 - (view)	Author: Mark Hammond (mhammond) *	Date: 2001-05-13 08:08
Logged In: YES user_id=14198 checked in: Checking in Lib/ntpath.py; new revision: 1.35; previous revision: 1.34 Checking in Lib/test/test_support.py; new revision: 1.23; previous revision: 1.22 Checking in Lib/test/test_unicode_file.py; initial revision: 1.1 Checking in Lib/test/output/test_unicode_file; initial revision: 1.1 Checking in Modules/posixmodule.c; new revision: 2.188; previous revision: 2.187 Checking in Python/bltinmodule.c; new revision: 2.206; previous revision: 2.205 Checking in Python/getargs.c; new revision: 2.56; previous revision: 2.55

History
Date	User	Action	Args
2022-04-10 16:03:53	admin	set	github: 34213
2001-03-22 05:02:42	mhammond	create