Issue 1272: Decode __file__ and co_filename to unicode using fs default

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45613

classification

Title:	Decode __file__ and co_filename to unicode using fs default
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	gvanrossum	Nosy List:	alexandre.vassalotti, christian.heimes, gvanrossum
Priority:	normal	Keywords:	patch

Created on 2007-10-12 13:41 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
py3k_file_fsenc2.patch	christian.heimes, 2007-10-12 13:41
updated_file_fsenc.patch	alexandre.vassalotti, 2007-10-13 22:10
updated_file_fsenc-2.patch	alexandre.vassalotti, 2007-10-14 01:32
updated_file_fsenc-3.patch	alexandre.vassalotti, 2007-10-14 01:42
updated_file_fsenc-4.patch	alexandre.vassalotti, 2007-10-14 04:35
updated_file_fsenc-5.patch	alexandre.vassalotti, 2007-10-14 16:17
updated_file_fsenc-6.patch	christian.heimes, 2007-10-14 17:06

Messages (31)
msg56374 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-12 13:41
I'm sending the patch in for review.
msg56388 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-13 21:19
Couple of nits: - You added a removal of hotshot from setup.py to the patch; but that's been checked in in the mean time. - Why add an 'errors' argument to the function when it's a fatal error to use it? - Using 0 to autodetect the length is scary. Normally we have two APIs for that, one ..._FromString and one ...FromStringAndSize. If you really don't want that, please use -1, which is at least an illegal value. - Why is there code in codeobject.c::PyCode_New() that still accepts a PyString for the filename? - In that file (and possibly others, I didn't check) your code uses spaces to indent while the surrounding code uses tabs. Moreover, your space indent seems to assume there are 4 spaces to a tab, but all our code (Python and C) is formatted assuming tabs are 8 spaces. (The indent isn't always 8 spaces -- but ASCII TAB characters always are 8, for us.) - Why copy the default encoding before mangling it? With a little extra care you will only have to copy it once. Also, consider not mangling at all, but assuming the encoding comes in a canonical form -- several other functions assume that, e.g. PyUnicode_Decode() and PyUnicode_AsEncodedString(). - I haven't run the unit tests yet. Will be doing that next...
msg56389 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-13 22:10
I found a few problems in your patch. In PyCode_New() the type check for the filename argument is incorrect: --- Objects/codeobject.c (revision 58412) +++ Objects/codeobject.c (working copy) @@ -59,7 +59,7 @@ freevars == NULL \|\| !PyTuple_Check(freevars) \|\| cellvars == NULL \|\| !PyTuple_Check(cellvars) \|\| name == NULL \|\| (!PyString_Check(name) && !PyUnicode_Check(name)) \|\| - filename == NULL \|\| !PyString_Check(filename) \|\| + filename == NULL \|\| (!PyString_Check(name) && !PyUnicode_Check(name)) \|\| lnotab == NULL \|\| !PyString_Check(lnotab) \|\| !PyObject_CheckReadBuffer(code)) { PyErr_BadInternalCall(); @@ -260,6 +267,8 @@ ourcellvars = PyTuple_New(0); if (ourcellvars == NULL) goto cleanup; + filename = PyUnicode_DecodeFSDefault(PyString_AS_STRING(filename), + 0, NULL); The following is unnecessary and will cause a reference leak: @@ -260,6 +267,8 @@ ourcellvars = PyTuple_New(0); if (ourcellvars == NULL) goto cleanup; + filename = PyUnicode_DecodeFSDefault(PyString_AS_STRING(filename), + 0, NULL); co = (PyObject )PyCode_New(argcount, kwonlyargcount, nlocals, stacksize, flags, I think the interface of PyUnicode_DecodeFSDefault() could be improved a bit. The function doesn't use the last argument 'errors', so why is there? I am not sure if it is useful to keep second argument, 'length', either. So, I believe the function prototype should be changed to: PyObject PyUnicode_Decode_FSDefault(const char *s); Another thing that I am not sure about is whether it is correct to consider ISO-8859-15 the same thing as Latin-1. Overall, the patch looks good to me and doesn't cause any test to fail. I attached an updated patch with the above issues fixed. Thank you, Christian, for the patch. :)
msg56390 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-13 22:20
Guido wrote: > Why copy the default encoding before mangling it? With a little > extra care you will only have to copy it once. Also, consider not > mangling at all, but assuming the encoding comes in a canonical form > -- several other functions assume that, e.g. PyUnicode_Decode() and > PyUnicode_AsEncodedString(). It is impossible guarantee that Py_FileSystemDefaultEncoding is normalized, since its value can be set using nl_langinfo(CODESET) during the bootstrapping process. PyUnicode_Decode() and other decoding/encoding functions use the codec module, which is not available during the early bootstrapping process, to normalize the encoding name.
msg56391 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-13 23:27
Guido van Rossum wrote: > - You added a removal of hotshot from setup.py to the patch; but that's > been checked in in the mean time. Oh, the change shouldn't make it into the patch. I guess I forgot a svn revert on setup.py > - Why add an 'errors' argument to the function when it's a fatal error > to use it? I wanted the signature of the method be equal to the other methods PyUnicode_Decode. I copied the FatalError from _PyUnicode_AsDefaultEncodedString(). > - Using 0 to autodetect the length is scary. Normally we have two APIs > for that, one ..._FromString and one ...FromStringAndSize. If you > really don't want that, please use -1, which is at least an illegal value. Oh right, -1 is much better for autodetect than 0. What do you prefer, a second method or -1 as auto detect? > - Why is there code in codeobject.c::PyCode_New() that still accepts a > PyString for the filename? Because it's my fault that I've overseen it. :/ > - In that file (and possibly others, I didn't check) your code uses > spaces to indent while the surrounding code uses tabs. Moreover, your > space indent seems to assume there are 4 spaces to a tab, but all our > code (Python and C) is formatted assuming tabs are 8 spaces. (The > indent isn't always 8 spaces -- but ASCII TAB characters always are 8, > for us.) Some C files like unicodeobject.c are using 4 spaces while other files are using tabs for indention. My editor may got confused by the mix. I've manually fixed it in the patch but I may have overseen a line or two. > - Why copy the default encoding before mangling it? With a little extra > care you will only have to copy it once. Also, consider not mangling at > all, but assuming the encoding comes in a canonical form -- several > other functions assume that, e.g. PyUnicode_Decode() and > PyUnicode_AsEncodedString(). My C is a bit rusty and still need to learn news tricks. I'm trying to see if I can remove the extra copy without causing a problem. The other part of your question was already answered by Alexandre. The aliases map is defined in Python code. It's not available so early in the boot strapping process. We'd have to redesign the assignment of co_filename and __file__ completely if we want to use the aliases and other codecs. For example we could store a PyString at first and redo all names once the codecs are set up. Christian
msg56393 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 00:35
On 10/13/07, Christian Heimes <report@bugs.python.org> wrote: > Guido van Rossum wrote: > > - Why add an 'errors' argument to the function when it's a fatal error > > to use it? > > I wanted the signature of the method be equal to the other methods > PyUnicode_Decode. I copied the FatalError from > _PyUnicode_AsDefaultEncodedString(). But that function is a terrible example; it was done that way because an earlier version of the function did allow using the errors argument and I wanted to make sure to catch all calls that were still passing an errors value. This doesn't apply here, we're creating a brand new API. > > - Using 0 to autodetect the length is scary. Normally we have two APIs > > for that, one ..._FromString and one ...FromStringAndSize. If you > > really don't want that, please use -1, which is at least an illegal value. > > Oh right, -1 is much better for autodetect than 0. What do you prefer, > a second method or -1 as auto detect? Even better is Alexandre's version: always autodetect. I think we can assume that filenames are always available as 0-terminated byte arrays, since that's how all system calls need them. Anyway, let me know if you want to change something in Alexandre's version or if you want him to check it in. Oh. Hm. I still wish that PyCode_New() could just insist that the filename argument is a PyUnicode instance. Why can't it? Perhaps the caller should be fixed instead?
msg56394 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 00:43
Guido van Rossum wrote: > But that function is a terrible example; it was done that way because > an earlier version of the function did allow using the errors > argument and I wanted to make sure to catch all calls that were still > passing an errors value. This doesn't apply here, we're creating a > brand new API. Ahhh! I really didn't know it. I thought that the non functional arguments have a purpose. > Anyway, let me know if you want to change something in Alexandre's > version or if you want him to check it in. I'm going to create a new patch based on his fixes and your recommendations. > Oh. Hm. I still wish that PyCode_New() could just insist that the > filename argument is a PyUnicode instance. Why can't it? Perhaps the > caller should be fixed instead? I'll try.
msg56396 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 00:51
> > Oh. Hm. I still wish that PyCode_New() could just insist that the > > filename argument is a PyUnicode instance. Why can't it? Perhaps the > > caller should be fixed instead? > I'll try. I figured out the problem -- it came from marshalled old code objects. If you throw away all .pyc files the problem goes away. You can also get rid of the similar checks for the 'name' argument -- this should just be a PyUnicode too. A systematic approach to invalidating all the .pyc files is updating the magic number in import.c.
msg56397 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 00:57
Guido van Rossum wrote: > - Why copy the default encoding before mangling it? With a little extra > care you will only have to copy it once. Now I remember why I added the strncpy() call plus encoding[31] = '\0'. I wanted to make sure that the code doesn't break even if the encoding name is longer than 31 + 1 chars long. I'm aware that it's very unlikely but I didn't want to take chances. You are allowed to call me paranoid. g Christian
msg56398 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 01:00
Well, you could ensure that by checking that you haven't reached the end of the mangling buffer. That will have the added advantage that when the input is something silly like 32 spaces followed by "utf-8" it will be still be mangled correctly. The slight extra cost of the check (could be a single pointer compare) is offset by saving a call to strncpy(). --Guido On 10/13/07, Christian Heimes <report@bugs.python.org> wrote: > > Christian Heimes added the comment: > > Guido van Rossum wrote: > > - Why copy the default encoding before mangling it? With a little extra > > care you will only have to copy it once. > > Now I remember why I added the strncpy() call plus encoding[31] = '\0'. > I wanted to make sure that the code doesn't break even if the encoding > name is longer than 31 + 1 chars long. I'm aware that it's very unlikely > but I didn't want to take chances. You are allowed to call me paranoid. g > > Christian > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1272> > __________________________________ >
msg56401 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-14 01:32
Guido wrote: > I figured out the problem -- it came from marshalled old code objects. > If you throw away all .pyc files the problem goes away. You can also > get rid of the similar checks for the 'name' argument -- this should > just be a PyUnicode too. A systematic approach to invalidating all the > .pyc files is updating the magic number in import.c. Done. I had to remove a few another PyString instances in pyexpat.c and _ctypes.c. So, here (hopefully) the final version of the patch. The changes from the last version are: - Correct a typo in of the comments in PyUnicode_DecodeFSDefault - Specified in the API doc of PyUnicode_DecodeFSDefault that the function take a null-terminated string. - Bumped the magic number in import.c - Fix PyCode_New calls in _ctypes and pyexpat module. - Remove the PyString type check on 'filename' and 'name' in PyCode_New. - Remove the unneeded string coercion code from PyCode_New.
msg56402 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-14 01:42
> Remove the PyString type check on 'filename' and 'name' in PyCode_New. Oops. I removed one of the ! the checks by mistake.
msg56404 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 02:15
Crys, is this OK with you? On 10/13/07, Alexandre Vassalotti <report@bugs.python.org> wrote: > > Alexandre Vassalotti added the comment: > > Guido wrote: > > I figured out the problem -- it came from marshalled old code objects. > > If you throw away all .pyc files the problem goes away. You can also > > get rid of the similar checks for the 'name' argument -- this should > > just be a PyUnicode too. A systematic approach to invalidating all the > > .pyc files is updating the magic number in import.c. > > Done. > > I had to remove a few another PyString instances in pyexpat.c and > _ctypes.c. So, here (hopefully) the final version of the patch. > > The changes from the last version are: > > - Correct a typo in of the comments in PyUnicode_DecodeFSDefault > - Specified in the API doc of PyUnicode_DecodeFSDefault that the > function take a null-terminated string. > - Bumped the magic number in import.c > - Fix PyCode_New calls in _ctypes and pyexpat module. > - Remove the PyString type check on 'filename' and 'name' in PyCode_New. > - Remove the unneeded string coercion code from PyCode_New. > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1272> > __________________________________ >
msg56405 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 03:06
Guido van Rossum wrote: > Guido van Rossum added the comment: > > Crys, is this OK with you? Alexandre's mangle loop doesn't do the same job as mine. Chars like _ and - aren't removed from the encoding name and the if clauses don't catch for example UTF-8 or ISO-8859-1 only UTF8 or ISO8859-1. Also he has overseen a PyString_Check in the code repr function. I'm working on a better mangler and I believe that I can make Py_FilesystemEncoding available much earlier in Py_InitializeEx(). after some debugging I fear that we are on the wrong path. Py_FileSystemEncoding is set much later in the boot strapping process unless its value is hard coded (Win32 and Apple only). Any attempt to get the right codec or even a normalized name without the codecs package is going to extremely hard. We have to get the codecs up and Py_FileSystemEncoding before we can decode the filenames. :( I think that the problem needs much more attention and a proper fix. Christian
msg56406 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 03:35
OK, in the spirit of delegation I'll leave this for you and Alexandre to work out more. If you're stuck, post to the list so others can jump in.
msg56408 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-14 04:34
Christian wrote: > Alexandre's mangle loop doesn't do the same job as mine. Chars like _ > and - aren't removed from the encoding name and the if clauses don't > catch for example UTF-8 or ISO-8859-1 only UTF8 or ISO8859-1. That isn't true. My mangler does exactly the same thing as your original one. However, I forgot to add Py_CHARMASK to the calls of tolower() and isalnum() which would cause problems on platforms with signed char. > Also he has overseen a PyString_Check in the code repr function. Fixed. > We have to get the codecs up and Py_FileSystemEncoding before we can > decode the filenames. :( I think that the problem needs much more > attention and a proper fix. Maybe adding a global variable, let's say _Py_Codecs_Ready, could be used to notify PyUnicode_DecodeFSDefault that it can use PyUnicode_Decode, instead of relying only on the built-in codecs. That would be much simpler than changing boostrapping process. Here yet another updated patch. The changes are the following: - Add Py_CHARMASK to tolower() and isalnum() calls in PyUnicode_DecodeFSDefault(). - Use PyUnicode_Check(), instead of PyString_Check(), in code_repr(). - Update comments for co_filename and co_name in PyCodeObject struct definition. - Fix a PyString_AS_STRING(co->co_name) instance in compile.c - Replace %S for %U in PyErr_Format() calls for substituting co_name. One more thing, frozen.c needs to be updated. The module data contains a code object with a PyString co_name. However, there is a bug in the imp module (it doesn't detect the encoding from modelines, which cause io.TextIOWrapper to crash) that prevents me from generating the data for frozen.c.
msg56414 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 15:05
Alexandre Vassalotti wrote: > That isn't true. My mangler does exactly the same thing as your > original one. > > However, I forgot to add Py_CHARMASK to the calls of tolower() and > isalnum() which would cause problems on platforms with signed char. I wasn't sure how tolower() reacts on numeric values. The addition of Py_CHARMASK is a good idea. It's a shame stringobject.c doesn't expose its non locale version of tolower() and isalnum(). > Maybe adding a global variable, let's say _Py_Codecs_Ready, could be > used to notify PyUnicode_DecodeFSDefault that it can use > PyUnicode_Decode, instead of relying only on the built-in codecs. That > would be much simpler than changing boostrapping process. It still f... up the file names of Python modules that were loaded before the codecs are ready to use. What do you think about this (pseudocode): #if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T) return PyUnicode_DecodeMBCS(string, "replace) #elif defined(__APPLE__) return PyUnicode_DecodeUTF8(string, "replace) #else * set __file__ and co_filename to a preliminary value with PyUnicode_DecodeUTF8(string, "replace") * Store a pointer and original string in a linked list struct Py_FixFileNames { struct Py_FixFileNames next; char string; PyObject unicode; }; After the codecs and Py_FileSystemDefaultEncoding are set up in Py_InitializeEx() check the value. If the fs default encoding isn't UTF-8 redo all encodings with the correct encoding. * Free the linked list * From now on use the codecs package, PyUnicode_Decode() and Py_FileSystemDefaultEncoding for every filename #endif Add comments to See Python/bltinmodule.c Py_FileSystemDefaultEncoding and Objects/unicodeobject.c unicode_default_encoding that PyUnicode_DecodeFSDefault depends on the values. It ain't beautiful and fast but we definitely get the right file names after boot strapping and fair file names during boot strapping. Christian
msg56415 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-14 15:23
Only a few modules are involved in the bootstrap. The filename is mostly used to display in the traceback. There is already a fallback in the traceback-printing code that tries to look through sys.path for a file matching the module if it can't open the filename from the code object. So I suggest to do something much simpler -- if the default encoding isn't set yet, use ASCII + replace, and leave it at that -- don't bother fixing these things later. Another thought: when installed, the built-in modules are compiled to byte code by compileall.py, which runs late enough so that the filesystem encoding is known and everything can be done right. I have a question for Alexandre related to frozen.c -- why is there a mode line with an encoding involved in freezing hello.py?
msg56417 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-14 16:00
> I have a question for Alexandre related to frozen.c -- why is there a > mode line with an encoding involved in freezing hello.py? For some reason which I don't know about, freeze.py tries to read all the modules accessible from sys.path: # collect all modules of the program dir = os.path.dirname(scriptfile) path[0] = dir mf = modulefinder.ModuleFinder(path, debug, exclude, replace_paths) The problem is the imp module, which modulefinder uses, does not detect the encoding of the files from the mode-line. This causes TextIOWrapper to crash when it tries to read modules using an encoding other than ASCII or UTF-8. Here an example: >>> import imp >>> imp.find_module('heapq')[0].read() Traceback (most recent call last): ... UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1428-1430: invalid data I probably should open a seperate issue in the tracker for this, though.
msg56419 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-14 16:17
I thought of another way to implement PyUnicode_DecodeFSDefault. If Py_FileSystemDefaultEncoding is set, decode with the codecs module, otherwise use UTF-8 + replace. This works because when Py_FileSystemDefaultEncoding is initialized at the end of Py_InitializeEx(), the codecs module is ready to be used. Here's what it looks like: PyObject* PyUnicode_DecodeFSDefault(const char s) { Py_ssize_t size = (Py_ssize_t)strlen(s); / During the early bootstrapping process, Py_FileSystemDefaultEncoding can be undefined. If it is case, decode using UTF-8. The following assumes that Py_FileSystemDefaultEncoding is set to a built-in encoding during the bootstrapping process where the codecs aren't ready yet. */ if (Py_FileSystemDefaultEncoding) { return PyUnicode_Decode(s, size, Py_FileSystemDefaultEncoding, "replace"); } else { return PyUnicode_DecodeUTF8(s, size, "replace"); } } It is not perfect, since the extra function calls in the codecs module causes test_profile and test_doctest to fail. However, I think this is much simpler that the previous versions.
msg56420 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 16:22
Alexandre Vassalotti wrote: > Alexandre Vassalotti added the comment: > > I thought of another way to implement PyUnicode_DecodeFSDefault. If > Py_FileSystemDefaultEncoding is set, decode with the codecs module, > otherwise use UTF-8 + replace. This works because when > Py_FileSystemDefaultEncoding is initialized at the end of > Py_InitializeEx(), the codecs module is ready to be used. Here's what > it looks like: That's almost what I had in mind but with two exceptions for __APPLE__ and MS_WINDOWS. For both OS the value of Py_FileSystemDefaultEncoding is hard coded to UTF-8 / MBCS. In your code the method would use PyUnicode_Decode() too early but you can work around the problem with two #if defined. Christian
msg56422 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-14 17:06
Changes since updated_file_fsenc-5.patch: * Fix for hard coded FS default encoding on Apple and Windows * Added two notes to unicode_default_encoding and Py_FileSystemDefaultEncoding
msg56425 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 00:13
This looks promising. I'm working on the freeze issue. Once I get that working I'll check this in. Thanks Alexandre and Christian for all your hard work!!!
msg56426 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 00:27
> The problem is the imp module, which modulefinder uses, does not > detect the encoding of the files from the mode-line. This causes > TextIOWrapper to crash when it tries to read modules using an encoding > other than ASCII or UTF-8. Here an example: > > >>> import imp > >>> imp.find_module('heapq')[0].read() > Traceback (most recent call last): > ... > UnicodeDecodeError: 'utf8' codec can't decode bytes in position > 1428-1430: invalid data I can't reproduce this. Can you open a separate issue?
msg56427 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 00:34
>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position >> 1428-1430: invalid data > > I can't reproduce this. Can you open a separate issue? It breaks for me with the same error message on Ubuntu Linux, i386, UCS-4 build and locale de_DE.UTF-8. (<io.TextIOWrapper object at 0xb7c9158c>, '/home/heimes/dev/python/py3k/Lib/heapq.py', ('.py', 'U', 1)) >>> imp.find_module('heapq')[0].read() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/heimes/dev/python/py3k/Lib/io.py", line 1224, in read res += decoder.decode(self.buffer.read(), True) File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1428-1430: invalid data >>> imp.find_module('heapq')[0].readline() '# -- coding: Latin-1 --\n' >>> imp.find_module('heapq')[0].encoding 'UTF-8'
msg56428 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 00:36
> This looks promising. I'm working on the freeze issue. Once I get that > working I'll check this in. Thanks Alexandre and Christian for all > your hard work!!! You're welcome. Does the patch qualify me for Misc/ACKS? :) I'm going to work on the basestring patch after you have committed the changes. Christian
msg56429 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 00:57
I found two minor bugs in the fix. In Modules/posixmodule.c the tmpnam() and tempnam() methods return a PyString instance. Please change line 5373 and 5431 to use PyUnicode_DecodeFSDefault(). Index: Modules/posixmodule.c =================================================================== --- Modules/posixmodule.c (Revision 58461) +++ Modules/posixmodule.c (Arbeitskopie) @@ -5370,7 +5370,7 @@ #endif if (name == NULL) return PyErr_NoMemory(); - result = PyString_FromString(name); + result = PyUnicode_DecodeFSDefault(name); free(name); return result; } @@ -5428,7 +5428,7 @@ Py_XDECREF(err); return NULL; } - return PyString_FromString(buffer); + return PyUnicode_DecodeFSDefault(buffer); } #endif
msg56430 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 01:31
> > This looks promising. I'm working on the freeze issue. Once I get that > > working I'll check this in. Thanks Alexandre and Christian for all > > your hard work!!! > > You're welcome. Does the patch qualify me for Misc/ACKS? :) Yes, and also Alexandre. :-) > I'm going to work on the basestring patch after you have committed the > changes. The commit may have to wait a bit; I'm finding some problems on OSX when the directory in which I'm building has a unicode character in it. (Now, svn has severe problems with this as well, and so far it looks like all the tests run; but still, it's disturbing.)
msg56432 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 02:53
Committed revision 58466. Fingers crossed.
msg56482 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-16 01:22
I wrote in msg56419: > It is not perfect, since the extra function calls in the codecs module > causes test_profile and test_doctest to fail. How this was resolved?
msg56483 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-16 01:25
>> It is not perfect, since the extra function calls in the codecs module >> causes test_profile and test_doctest to fail. > > How this was resolved? It's not resolved yet.

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45613
2007-10-16 01:25:31	christian.heimes	set	messages: + msg56483
2007-10-16 01:22:46	alexandre.vassalotti	set	messages: + msg56482
2007-10-15 02:53:16	gvanrossum	set	status: open -> closed resolution: accepted messages: + msg56432
2007-10-15 01:31:29	gvanrossum	set	messages: + msg56430
2007-10-15 00:57:19	christian.heimes	set	messages: + msg56429
2007-10-15 00:36:23	christian.heimes	set	messages: + msg56428
2007-10-15 00:34:28	christian.heimes	set	messages: + msg56427
2007-10-15 00:27:17	gvanrossum	set	messages: + msg56426
2007-10-15 00:13:33	gvanrossum	set	messages: + msg56425
2007-10-14 17:06:25	christian.heimes	set	files: + updated_file_fsenc-6.patch messages: + msg56422
2007-10-14 16:22:53	christian.heimes	set	messages: + msg56420
2007-10-14 16:17:39	alexandre.vassalotti	set	files: + updated_file_fsenc-5.patch messages: + msg56419
2007-10-14 16:00:22	alexandre.vassalotti	set	messages: + msg56417
2007-10-14 15:23:43	gvanrossum	set	messages: + msg56415
2007-10-14 15:05:27	christian.heimes	set	messages: + msg56414
2007-10-14 04:35:06	alexandre.vassalotti	set	files: + updated_file_fsenc-4.patch messages: + msg56408
2007-10-14 03:35:23	gvanrossum	set	messages: + msg56406
2007-10-14 03:06:47	christian.heimes	set	messages: + msg56405
2007-10-14 02:15:19	gvanrossum	set	messages: + msg56404
2007-10-14 01:42:29	alexandre.vassalotti	set	files: + updated_file_fsenc-3.patch messages: + msg56402
2007-10-14 01:32:10	alexandre.vassalotti	set	files: + updated_file_fsenc-2.patch messages: + msg56401
2007-10-14 01:02:25	gvanrossum	link	issue1264 superseder
2007-10-14 01:00:41	gvanrossum	set	messages: + msg56398
2007-10-14 00:57:17	christian.heimes	set	messages: + msg56397
2007-10-14 00:51:49	gvanrossum	set	messages: + msg56396
2007-10-14 00:43:59	christian.heimes	set	messages: + msg56394
2007-10-14 00:35:54	gvanrossum	set	messages: + msg56393
2007-10-13 23:27:30	christian.heimes	set	messages: + msg56391
2007-10-13 22:20:02	alexandre.vassalotti	set	messages: + msg56390
2007-10-13 22:10:19	alexandre.vassalotti	set	files: + updated_file_fsenc.patch
2007-10-13 22:10:06	alexandre.vassalotti	set	nosy: + alexandre.vassalotti messages: + msg56389
2007-10-13 21:19:06	gvanrossum	set	messages: + msg56388
2007-10-13 20:49:32	gvanrossum	set	assignee: gvanrossum nosy: + gvanrossum
2007-10-12 14:09:43	loewis	set	keywords: + patch
2007-10-12 13:41:38	christian.heimes	create