msg166300 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-24 15:24 |
test_posixpath.PosixCommonTest.test_nonascii_abspath fails on Japanese edition
of Windows.
If a file name was invalid byte character, os.chdir() raises UnicodeDecodeError()
instead of WindowsError.
This is a byte-api issue on Windows, so we may be able to simply skip this test.
======================================================================
ERROR: test_nonascii_abspath (test.test_posixpath.PosixCommonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\cygwin\home\ishimoto\src\cpython\lib\test\test_genericpath.py", line
318, in test_nonascii_abspath
with support.temp_cwd(b'\xe7w\xf0'):
File "C:\cygwin\home\ishimoto\src\cpython\lib\contextlib.py", line 48, in __en
ter__
return next(self.gen)
File "C:\cygwin\home\ishimoto\src\cpython\lib\test\support.py", line 614, in t
emp_cwd
os.chdir(path)
UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping
for the Unicode character exists in the target code page.
----------------------------------------------------------------------
|
msg166302 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-24 15:51 |
changed name of test method. In the old patch, new test method shadowed
existing one.
|
msg166350 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-25 02:37 |
I'm sorry, I generated a patch in wrong direction.
|
msg166362 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-25 08:03 |
The following change is a major change on how Python handles undecodable filenames on Windows:
- return PyUnicode_DecodeMBCS(s, size, NULL);
+ return PyUnicode_DecodeMBCS(s, size, "surrogateescape");
I disagree with this change, Python should not generate surrogates *on Windows*.
By the way, there is also os.fsdecode(), it has the same behaviour than PyUnicode_DecodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() (it uses the "strict" error handler on Windows).
|
msg166365 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-25 08:17 |
> If a file name was invalid byte character, os.chdir() raises
> UnicodeDecodeError() instead of WindowsError.
I realized that the problem is in the error handling: raising the OSError fails with a UnicodeDecodeError because PyErr_SetFromWindowsErrWithFilename() calls PyUnicode_DecodeFSDefault(), whereas the filename is not decodable. If you want to change something, it should be PyErr_SetFromWindowsErrWithFilename(). We may use PyObject_Repr() or PyObject_ASCII() for example.
--
See also the issue #13374: "The Windows bytes API has been deprecated in the os module. Use Unicode filenames instead of bytes filenames to not depend on the ANSI code page anymore and to support any filename."
|
msg166366 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-25 08:33 |
Yes, I know #13374, that's why I wrote
> This is a byte-api issue on Windows, so we may be able to simply skip
> this test.
Do you think we need a patch to avoid UnicodeDecodeError raised?
Or should we change test to skip this?
|
msg166373 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-25 09:19 |
> Do you think we need a patch to avoid UnicodeDecodeError raised?
> Or should we change test to skip this?
It's a bug, the test should not be skipped. You should get an OSError
because the chdir() failed, not an UnicodeDecodeError.
|
msg166374 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-07-25 09:23 |
IMO, it is ok to skip the test on Windows; it was apparently targeted for Unix.
If we preserve it, we should pick a file name (on Windows) which is encodable in the file system encoding.
|
msg166375 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-07-25 09:25 |
As for your patch: you are missing the point of the test. The file name is assumed to be valid, despite it not being in the file system encoding.
|
msg166429 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-25 22:01 |
decode_filename_mbcs.patch uses the "replace" error handler to decode
the filename on Windows. It should solve the issue.
|
msg166430 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-07-25 22:32 |
haypo: how is this meant to fix the bug? Won't it now cause a WindowsError, when a successful operation is expected?
|
msg166441 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-25 23:23 |
> haypo: how is this meant to fix the bug?
> Won't it now cause a WindowsError, when a successful
> operation is expected?
Oh, I was referring to the new test proposed in the attached patch (issue15441.patch):
+ def test_chdir_invalid_filename(self):
+ self.assertRaises(WindowsError, os.chdir, b'\xe7w\xf0')
os.chdir() in a non existent directory with a bytes name should raise an OSError, not a UnicodeDecodeError.
--
About the original issue: it looks like mkdir(bytes) decodes internally the directory name and ignore undecodable bytes. On Windows 7, mkdir(b"\xe7w\xf0") creates a directory called "\u8f42" (b"\xe7w", b"\xf0" suffix has been dropped). It is not possible to change the directory to "b"\xe7w\xf0", but it works with "b"\xe7w" or "\u8f42".
There are 2 issues:
* On Windows, os.chdir(bytes) should not raise a UnicodeDecodeError on the directory does not exist
* test_nonascii_abspath() can be skipped on Windows if os.fsdecode(b"\xe7w\xf0") fails, or b"\xe7w" name should be used instead
My patch is not the best solution because it looses information (if the filename contains undecodable bytes). I realized that OSError.filename is not necessary a str, bytes is also accepted. win32_error_object() can be used. The following patch pass the original bytes object to OSError constructor instead:
diff -r 43ae2a243eca Modules/posixmodule.c
--- a/Modules/posixmodule.c Thu Jul 26 00:47:15 2012 +0200
+++ b/Modules/posixmodule.c Thu Jul 26 01:19:14 2012 +0200
@@ -1138,11 +1138,10 @@ static PyObject *
path_error(char *function_name, path_t *path)
{
#ifdef MS_WINDOWS
- if (path->narrow)
- return win32_error(function_name, path->narrow);
- if (path->wide)
- return win32_error_unicode(function_name, path->wide);
- return win32_error(function_name, NULL);
+ return PyErr_SetExcFromWindowsErrWithFilenameObject(
+ PyExc_OSError,
+ 0,
+ path->object);
#else
return path_posix_error(function_name, path);
#endif
(sorry, I failed to attach a patch, I have an issue with my file chooser...)
|
msg166445 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-26 00:42 |
Here's another try:
In this patch:
- skip test_nonascii_abspath() test since it fails on some code pages.
- Added a test to reproduce bug on latin code pages.
- Use repr(filename) only if decode failed. This is more backward-compatible and does not lose any information.
|
msg166446 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-26 00:45 |
martin: while os.mkdir(b'\xe7w\xf0') succeeds, but strangely enough, subsequent os.chdir(b'\xe7w\xf0') fails.
This is not a bug in Python, but Windows issue.
|
msg166452 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-26 02:18 |
Updated patch again.
In this patch, byte object is passed as argument to WindowsError as Victor's patch.
I prefer to fix PyErr_SetFromWindowsErrWithFilename() over path_error(), because PyErr_SetFromWindowsErrWithFilename() is public API, so someone may use PyErr_SetFromWindowsErrWithFilename() for their own extension modules.
|
msg166471 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-26 10:53 |
+ @unittest.skipIf(sys.platform == 'win32',
+ "Win32 can fail cwd() with invalid utf8 name")
def test_nonascii_abspath(self):
You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath():
name = b'\xe7w\xf0'
if sys.platform == 'win32':
try:
os.fsdecode(name)
except UnicodeDecodeError:
self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding()))
Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually.
+ batfile = """
+chcp 932
+{exe} {scriptname}
+chcp {codepage}
+"""
chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().
It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes.
Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows.
--
Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7.
--
Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_Set*WithFilename(..., char*) functions. My patch for path_error() avoids the creation of a temporary bytes objets.
--
test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' :
- # Issue 3426: check that abspath retuns unicode when the arg is unicode
- # and str when it's str, with both ASCII and non-ASCII cwds
- for cwd in (u'cwd', u'\xe7w\xf0'):
We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded.
--
Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.
I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).
|
msg166482 - (view) |
Author: Atsuo Ishimoto (ishimoto) * |
Date: 2012-07-26 14:18 |
> chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().
Sorry, I should have investigated the code more carefully.
> Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.
Looks good to me. Thank you!
|
msg166509 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-07-26 19:54 |
I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes. This issue is about test_nonascii_abspath failing on the Japanese edition of Windows (see the first sentence of the first message from Atsuo)
If you absolutely must fix the other issue right away also, it needs a test case.
|
msg166653 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-07-28 12:51 |
> I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes.
Sorry, you are right: I created the issue #15478 for the general fix, and we will use this issue to fix test_genericpath.
This issue can be fixed in Python 3.3, whereas #15478 will have to wait for Python 3.4 beause it's an major change and may break a lot of code. It's better to wait the next release to test such change correctly.
|
msg167157 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-08-01 18:07 |
New changeset 3edc71ed19e7 by Victor Stinner in branch 'default':
Issue #15441: Skip test_nonascii_abspath() of test_genericpath on Windows
http://hg.python.org/cpython/rev/3edc71ed19e7
|
msg167158 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-08-01 18:10 |
This issue (the test) should be fixed, see the issue #15478 for the real fix.
|
msg174379 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-10-31 22:01 |
New changeset b3434c1ae503 by Victor Stinner in branch 'default':
Issue #15441, #15478: Reenable test_nonascii_abspath() on Windows
http://hg.python.org/cpython/rev/b3434c1ae503
|
msg174380 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-10-31 22:03 |
> If a file name was invalid byte character,
> os.chdir() raises UnicodeDecodeError() instead of WindowsError.
I believe this case is not handled correctly in Python 3.4 (version under development) thanks to my work on issue #15478.
Thanks for the report Atsuo Ishimoto! Don't hesitate to report other similar issue ;-)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:33 | admin | set | github: 59646 |
2012-10-31 22:03:47 | vstinner | set | messages:
+ msg174380 |
2012-10-31 22:01:34 | python-dev | set | messages:
+ msg174379 |
2012-08-01 18:10:35 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg167158
|
2012-08-01 18:07:51 | python-dev | set | nosy:
+ python-dev messages:
+ msg167157
|
2012-07-28 12:51:17 | vstinner | set | messages:
+ msg166653 |
2012-07-26 19:54:13 | loewis | set | messages:
+ msg166509 |
2012-07-26 14:18:24 | ishimoto | set | messages:
+ msg166482 |
2012-07-26 10:53:31 | vstinner | set | files:
+ win32_bytes_filename.patch nosy:
+ flox messages:
+ msg166471
|
2012-07-26 02:18:25 | ishimoto | set | files:
+ issue15441_3.patch
messages:
+ msg166452 |
2012-07-26 00:45:50 | ishimoto | set | messages:
+ msg166446 |
2012-07-26 00:42:57 | ishimoto | set | files:
+ issue15441_2.patch
messages:
+ msg166445 |
2012-07-25 23:23:37 | vstinner | set | messages:
+ msg166441 |
2012-07-25 22:32:47 | loewis | set | messages:
+ msg166430 |
2012-07-25 22:01:17 | vstinner | set | files:
+ decode_filename_mbcs.patch
messages:
+ msg166429 |
2012-07-25 09:25:26 | loewis | set | messages:
+ msg166375 |
2012-07-25 09:23:58 | loewis | set | messages:
+ msg166374 |
2012-07-25 09:19:50 | vstinner | set | messages:
+ msg166373 |
2012-07-25 08:33:55 | ishimoto | set | messages:
+ msg166366 |
2012-07-25 08:17:34 | vstinner | set | messages:
+ msg166365 |
2012-07-25 08:03:17 | vstinner | set | nosy:
+ loewis, tim.golden messages:
+ msg166362
|
2012-07-25 02:37:44 | ishimoto | set | files:
+ issue15441.patch
messages:
+ msg166350 |
2012-07-25 02:33:26 | ishimoto | set | files:
- test_nonascii_abspath_2.patch |
2012-07-24 18:04:09 | pitrou | set | nosy:
+ vstinner
stage: patch review |
2012-07-24 15:51:58 | ishimoto | set | files:
- test_nonascii_abspath.patch |
2012-07-24 15:51:46 | ishimoto | set | files:
+ test_nonascii_abspath_2.patch
messages:
+ msg166302 |
2012-07-24 15:24:07 | ishimoto | create | |