This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow pre-encoded strings as filenames
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: mhammond Nosy List: gvanrossum, lemburg, mhammond
Priority: normal Keywords: patch

Created on 2001-03-22 05:02 by mhammond, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-patch mhammond, 2001-03-22 05:04 The whole patch
Messages (7)
msg36172 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2001-03-22 05:02
This patch enables most filename parameters to use pre-
encoded strings.  On Windows, the default of "mbcs" is 
used.  On all other platforms, the default filename 
encoding is the same as the general default encoding, 
which in reality means there is no functional change.  
However, other platforms can simply plugin their own 
encodings.

Rationalle: os.listdir() etc already return pre-
encoded strings on some platforms (notably Windows).  
These pre-encoded strings may be used now for all 
these functions - however, if you convert this encoded 
string to a Unicode string, it can not be used to open 
the file.  This patch enables either a pre-encoded 
string to work (as now) or a Unicode representation of 
that same string (unlike now)

Things of note:
* I invented a new "Es" PyArg_ParseTuple marker.  This 
is very similar to "es", except it leaves string 
objects alone assuming they are already encoded 
correctly.  "es" assumes a string in the default 
encoding which it will then encode in the new 
characterset - ie, a pre-encoded string fails here.

* This means that all affected functions have an extra 
string copy.  This copy still happens even when 
strings are passed, and even on platforms where no 
Unicode filesystem support exists.  The only other 
alternative was to make a much uglier patch, somehow 
using string objects in-place, but converting and 
freeing the buffer when Unicode.  This could be done 
if desired, but I'm not sure the added code complexity 
is worth it.

* New method on win32: nt._getpathname().  This is 
almost identical to win32api.GetPathName(), except it 
handles encoded strings.  ntpath.py has also been 
changed to work with this.  A hidden bonus of this 
patch is that it will make os.abspath() work 
identically regardless of the Win32 extensions being 
installed.

* Tested on Linux, Windows 98 and Windows 2k.  Still 
working out how to build Python on my BeOs box :)

* New test for these semantics added.
msg36173 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2001-03-22 05:04
Logged In: YES 
user_id=14198

doh - forgot to click the checkbox
msg36174 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-03-22 21:45
Logged In: YES 
user_id=6380

Mark, I don't think you expected to get this into 2.1, did
you?  It's way too big.

Also, I think your patch to posixmodule.c has some bugs --
if I understand correctly, the format string "Es" requires
two arguments, the encoding and the address of the C string
pointer; but several functions (posix_rename and onwards)
don't pass the encoding name.
msg36175 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2001-03-22 22:10
Logged In: YES 
user_id=14198

I appreciate it is too late for 2.1 for a change of this 
size.

I don't think posixmodule is wrong - at least not how you 
think :)

posix_rename calls:
	return posix_2str(args, "EsEs:rename", rename);

however, it is posix_2str that passes the encoding, not 
posix_rename itself.  Ditto for posix_1str and 
posix_do_stat.
msg36176 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-04-27 07:54
Logged In: YES 
user_id=38388

I like the idea of telling the arg parser to accept strings
as-is, but I think that copying all the code just to
implement the new "E" parser. Much easier would be switching
on the second marker
(behind the "e"), e.g. using "et" and "et#".

Do you want me to look into this ?
msg36177 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2001-04-27 12:15
Logged In: YES 
user_id=14198

MAL - please do!  I generally look for the least-intrusive 
patch when dealing with potentially contentious issues, but 
I agree it makes more sense to rationalize.
msg36178 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2001-05-13 08:08
Logged In: YES 
user_id=14198

checked in:

Checking in Lib/ntpath.py;
new revision: 1.35; previous revision: 1.34
Checking in Lib/test/test_support.py;
new revision: 1.23; previous revision: 1.22
Checking in Lib/test/test_unicode_file.py;
initial revision: 1.1
Checking in Lib/test/output/test_unicode_file;
initial revision: 1.1
Checking in Modules/posixmodule.c;
new revision: 2.188; previous revision: 2.187
Checking in Python/bltinmodule.c;
new revision: 2.206; previous revision: 2.205
Checking in Python/getargs.c;
new revision: 2.56; previous revision: 2.55
History
Date User Action Args
2022-04-10 16:03:53adminsetgithub: 34213
2001-03-22 05:02:42mhammondcreate