Issue 15441: test_posixpath fails on Japanese edition of Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59646

classification

Title:	test_posixpath fails on Japanese edition of Windows
Type:	behavior	Stage:	patch review
Components:	Unicode, Windows	Versions:	Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, flox, ishimoto, loewis, python-dev, tim.golden, vstinner
Priority:	normal	Keywords:	patch

Created on 2012-07-24 15:24 by ishimoto, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue15441.patch	ishimoto, 2012-07-25 02:37		review
decode_filename_mbcs.patch	vstinner, 2012-07-25 22:01		review
issue15441_2.patch	ishimoto, 2012-07-26 00:42		review
issue15441_3.patch	ishimoto, 2012-07-26 02:18		review
win32_bytes_filename.patch	vstinner, 2012-07-26 10:53		review

Messages (23)
msg166300 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-24 15:24
test_posixpath.PosixCommonTest.test_nonascii_abspath fails on Japanese edition of Windows. If a file name was invalid byte character, os.chdir() raises UnicodeDecodeError() instead of WindowsError. This is a byte-api issue on Windows, so we may be able to simply skip this test. ====================================================================== ERROR: test_nonascii_abspath (test.test_posixpath.PosixCommonTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\cygwin\home\ishimoto\src\cpython\lib\test\test_genericpath.py", line 318, in test_nonascii_abspath with support.temp_cwd(b'\xe7w\xf0'): File "C:\cygwin\home\ishimoto\src\cpython\lib\contextlib.py", line 48, in __en ter__ return next(self.gen) File "C:\cygwin\home\ishimoto\src\cpython\lib\test\support.py", line 614, in t emp_cwd os.chdir(path) UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page. ----------------------------------------------------------------------
msg166302 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-24 15:51
changed name of test method. In the old patch, new test method shadowed existing one.
msg166350 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-25 02:37
I'm sorry, I generated a patch in wrong direction.
msg166362 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-25 08:03
The following change is a major change on how Python handles undecodable filenames on Windows: - return PyUnicode_DecodeMBCS(s, size, NULL); + return PyUnicode_DecodeMBCS(s, size, "surrogateescape"); I disagree with this change, Python should not generate surrogates on Windows. By the way, there is also os.fsdecode(), it has the same behaviour than PyUnicode_DecodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() (it uses the "strict" error handler on Windows).
msg166365 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-25 08:17
> If a file name was invalid byte character, os.chdir() raises > UnicodeDecodeError() instead of WindowsError. I realized that the problem is in the error handling: raising the OSError fails with a UnicodeDecodeError because PyErr_SetFromWindowsErrWithFilename() calls PyUnicode_DecodeFSDefault(), whereas the filename is not decodable. If you want to change something, it should be PyErr_SetFromWindowsErrWithFilename(). We may use PyObject_Repr() or PyObject_ASCII() for example. -- See also the issue #13374: "The Windows bytes API has been deprecated in the os module. Use Unicode filenames instead of bytes filenames to not depend on the ANSI code page anymore and to support any filename."
msg166366 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-25 08:33
Yes, I know #13374, that's why I wrote > This is a byte-api issue on Windows, so we may be able to simply skip > this test. Do you think we need a patch to avoid UnicodeDecodeError raised? Or should we change test to skip this?
msg166373 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-25 09:19
> Do you think we need a patch to avoid UnicodeDecodeError raised? > Or should we change test to skip this? It's a bug, the test should not be skipped. You should get an OSError because the chdir() failed, not an UnicodeDecodeError.
msg166374 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-07-25 09:23
IMO, it is ok to skip the test on Windows; it was apparently targeted for Unix. If we preserve it, we should pick a file name (on Windows) which is encodable in the file system encoding.
msg166375 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-07-25 09:25
As for your patch: you are missing the point of the test. The file name is assumed to be valid, despite it not being in the file system encoding.
msg166429 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-25 22:01
decode_filename_mbcs.patch uses the "replace" error handler to decode the filename on Windows. It should solve the issue.
msg166430 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-07-25 22:32
haypo: how is this meant to fix the bug? Won't it now cause a WindowsError, when a successful operation is expected?
msg166441 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-25 23:23
> haypo: how is this meant to fix the bug? > Won't it now cause a WindowsError, when a successful > operation is expected? Oh, I was referring to the new test proposed in the attached patch (issue15441.patch): + def test_chdir_invalid_filename(self): + self.assertRaises(WindowsError, os.chdir, b'\xe7w\xf0') os.chdir() in a non existent directory with a bytes name should raise an OSError, not a UnicodeDecodeError. -- About the original issue: it looks like mkdir(bytes) decodes internally the directory name and ignore undecodable bytes. On Windows 7, mkdir(b"\xe7w\xf0") creates a directory called "\u8f42" (b"\xe7w", b"\xf0" suffix has been dropped). It is not possible to change the directory to "b"\xe7w\xf0", but it works with "b"\xe7w" or "\u8f42". There are 2 issues: * On Windows, os.chdir(bytes) should not raise a UnicodeDecodeError on the directory does not exist * test_nonascii_abspath() can be skipped on Windows if os.fsdecode(b"\xe7w\xf0") fails, or b"\xe7w" name should be used instead My patch is not the best solution because it looses information (if the filename contains undecodable bytes). I realized that OSError.filename is not necessary a str, bytes is also accepted. win32_error_object() can be used. The following patch pass the original bytes object to OSError constructor instead: diff -r 43ae2a243eca Modules/posixmodule.c --- a/Modules/posixmodule.c Thu Jul 26 00:47:15 2012 +0200 +++ b/Modules/posixmodule.c Thu Jul 26 01:19:14 2012 +0200 @@ -1138,11 +1138,10 @@ static PyObject * path_error(char function_name, path_t path) { #ifdef MS_WINDOWS - if (path->narrow) - return win32_error(function_name, path->narrow); - if (path->wide) - return win32_error_unicode(function_name, path->wide); - return win32_error(function_name, NULL); + return PyErr_SetExcFromWindowsErrWithFilenameObject( + PyExc_OSError, + 0, + path->object); #else return path_posix_error(function_name, path); #endif (sorry, I failed to attach a patch, I have an issue with my file chooser...)
msg166445 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-26 00:42
Here's another try: In this patch: - skip test_nonascii_abspath() test since it fails on some code pages. - Added a test to reproduce bug on latin code pages. - Use repr(filename) only if decode failed. This is more backward-compatible and does not lose any information.
msg166446 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-26 00:45
martin: while os.mkdir(b'\xe7w\xf0') succeeds, but strangely enough, subsequent os.chdir(b'\xe7w\xf0') fails. This is not a bug in Python, but Windows issue.
msg166452 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-26 02:18
Updated patch again. In this patch, byte object is passed as argument to WindowsError as Victor's patch. I prefer to fix PyErr_SetFromWindowsErrWithFilename() over path_error(), because PyErr_SetFromWindowsErrWithFilename() is public API, so someone may use PyErr_SetFromWindowsErrWithFilename() for their own extension modules.
msg166471 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-26 10:53
+ @unittest.skipIf(sys.platform == 'win32', + "Win32 can fail cwd() with invalid utf8 name") def test_nonascii_abspath(self): You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath(): name = b'\xe7w\xf0' if sys.platform == 'win32': try: os.fsdecode(name) except UnicodeDecodeError: self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding())) Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually. + batfile = """ +chcp 932 +{exe} {scriptname} +chcp {codepage} +""" chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding(). It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes. Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows. -- Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7. -- Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_SetWithFilename(..., char) functions. My patch for path_error() avoids the creation of a temporary bytes objets. -- test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' : - # Issue 3426: check that abspath retuns unicode when the arg is unicode - # and str when it's str, with both ASCII and non-ASCII cwds - for cwd in (u'cwd', u'\xe7w\xf0'): We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded. -- Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError. I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).
msg166482 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2012-07-26 14:18
> chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding(). Sorry, I should have investigated the code more carefully. > Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError. Looks good to me. Thank you!
msg166509 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-07-26 19:54
I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes. This issue is about test_nonascii_abspath failing on the Japanese edition of Windows (see the first sentence of the first message from Atsuo) If you absolutely must fix the other issue right away also, it needs a test case.
msg166653 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-07-28 12:51
> I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes. Sorry, you are right: I created the issue #15478 for the general fix, and we will use this issue to fix test_genericpath. This issue can be fixed in Python 3.3, whereas #15478 will have to wait for Python 3.4 beause it's an major change and may break a lot of code. It's better to wait the next release to test such change correctly.
msg167157 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-08-01 18:07
New changeset 3edc71ed19e7 by Victor Stinner in branch 'default': Issue #15441: Skip test_nonascii_abspath() of test_genericpath on Windows http://hg.python.org/cpython/rev/3edc71ed19e7
msg167158 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-08-01 18:10
This issue (the test) should be fixed, see the issue #15478 for the real fix.
msg174379 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-10-31 22:01
New changeset b3434c1ae503 by Victor Stinner in branch 'default': Issue #15441, #15478: Reenable test_nonascii_abspath() on Windows http://hg.python.org/cpython/rev/b3434c1ae503
msg174380 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-10-31 22:03
> If a file name was invalid byte character, > os.chdir() raises UnicodeDecodeError() instead of WindowsError. I believe this case is not handled correctly in Python 3.4 (version under development) thanks to my work on issue #15478. Thanks for the report Atsuo Ishimoto! Don't hesitate to report other similar issue ;-)

History
Date	User	Action	Args
2022-04-11 14:57:33	admin	set	github: 59646
2012-10-31 22:03:47	vstinner	set	messages: + msg174380
2012-10-31 22:01:34	python-dev	set	messages: + msg174379
2012-08-01 18:10:35	vstinner	set	status: open -> closed resolution: fixed messages: + msg167158
2012-08-01 18:07:51	python-dev	set	nosy: + python-dev messages: + msg167157
2012-07-28 12:51:17	vstinner	set	messages: + msg166653
2012-07-26 19:54:13	loewis	set	messages: + msg166509
2012-07-26 14:18:24	ishimoto	set	messages: + msg166482
2012-07-26 10:53:31	vstinner	set	files: + win32_bytes_filename.patch nosy: + flox messages: + msg166471
2012-07-26 02:18:25	ishimoto	set	files: + issue15441_3.patch messages: + msg166452
2012-07-26 00:45:50	ishimoto	set	messages: + msg166446
2012-07-26 00:42:57	ishimoto	set	files: + issue15441_2.patch messages: + msg166445
2012-07-25 23:23:37	vstinner	set	messages: + msg166441
2012-07-25 22:32:47	loewis	set	messages: + msg166430
2012-07-25 22:01:17	vstinner	set	files: + decode_filename_mbcs.patch messages: + msg166429
2012-07-25 09:25:26	loewis	set	messages: + msg166375
2012-07-25 09:23:58	loewis	set	messages: + msg166374
2012-07-25 09:19:50	vstinner	set	messages: + msg166373
2012-07-25 08:33:55	ishimoto	set	messages: + msg166366
2012-07-25 08:17:34	vstinner	set	messages: + msg166365
2012-07-25 08:03:17	vstinner	set	nosy: + loewis, tim.golden messages: + msg166362
2012-07-25 02:37:44	ishimoto	set	files: + issue15441.patch messages: + msg166350
2012-07-25 02:33:26	ishimoto	set	files: - test_nonascii_abspath_2.patch
2012-07-24 18:04:09	pitrou	set	nosy: + vstinner stage: patch review
2012-07-24 15:51:58	ishimoto	set	files: - test_nonascii_abspath.patch
2012-07-24 15:51:46	ishimoto	set	files: + test_nonascii_abspath_2.patch messages: + msg166302
2012-07-24 15:24:07	ishimoto	create