Title: 3.0.1 crashes in unicode path
Components: Interpreter Core, Unicode, Windows Versions: Python 3.0, Python 3.1
Status: closed Resolution: fixed
Assigned To: Nosy List: miwa, ocean-city, pitrou
Created on 2009-02-15 11:26 by miwa, last changed 2022-04-11 14:56 by admin. This issue is now closed.

msg82150 - (view) Author: Musashi Tamura (miwa) Date: 2009-02-15 11:26
In unicode path Python 3.0.1 crashes when importing compiled module.
This does not happen on Python 3.0, new in 3.0.1.

Detailed Situation:
OS: win2000
current pathname contains Japanese characters.
./ contains only a statement "import b".
./ is empty.
> python
(nothing is happen but b.pyc is created)
> python
Traceback (most recent call last):
  File "", line 1, in <module>
    import b
UnicodeDecodeError: 'utf8' codec can't decode byte 0x82 in position 3:
unexpected code byte
msg82152 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-15 13:54
Quick observation. This bug was introduces in r68363.

	newname = PyUnicode_FromString(pathname);

pathname is mbcs on windows, but PyUnicode_FromString assumes it as UTF8.
msg82153 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-15 14:03
Here is a patch.
msg82154 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-15 14:21
Gasp. Sorry for the bug.
Should PyUnicode_CompareWithASCIIString() be replaced with something
else as well?
msg82159 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-15 15:36
I'm not sure. Even my patch might not be correct anyway.

On my VC6 Debugger,
update_compiled_module(PyCodeObject *co, char *pathname)
pathname is surely mbcs.

But its caller load_source_module is calling

	if (fstat(fileno(fp), &st) != 0) {
			     "unable to get file status from '%s'",
		return NULL;

I've looked into PyErr_Format code, it seems %s assumes utf-8. Anway,
it's difficult to know char* is utf-8 or filesystem encoding. :-(
msg82160 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-15 16:29
I tracked down, and I found this mbcs path is set in Python/import.c(1394) 

	if (PyUnicode_Check(v)) {
		v = PyUnicode_AsEncodedString(v, 
		    Py_FileSystemDefaultEncoding, NULL);
		if (v == NULL)
			return NULL;

And this was introduced in r64126 to fix segfault mentioned in
issue1342. I'm not understanding why segfault happened but, I feel this
issue is the part of big problem. (issue3080)
msg82164 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-15 18:26
>Should PyUnicode_CompareWithASCIIString() be replaced with something
>else as well?

I hope revised patch will fix this too. There seems to be no function to
compare unicode object and file system encoded string, so I moved
unicode creation before comparation. This might increase overhead a bit.

Issue3080 is big issue, so this is minimal solution for this issue. I
confirmed passed.
msg83110 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-03-03 23:53
I cannot say anything except that the patch looks ok. If it doesn't make
anything worse and solves the present problem, I guess you can commit it.
msg83113 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-04 01:58
Thanks, fixed in r70157(py3k) and r70158(release30-maint)
