This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_posixpath fails on Japanese edition of Windows
Type: behavior Stage: patch review
Components: Unicode, Windows Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, flox, ishimoto, loewis, python-dev, tim.golden, vstinner
Priority: normal Keywords: patch

Created on 2012-07-24 15:24 by ishimoto, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue15441.patch ishimoto, 2012-07-25 02:37 review
decode_filename_mbcs.patch vstinner, 2012-07-25 22:01 review
issue15441_2.patch ishimoto, 2012-07-26 00:42 review
issue15441_3.patch ishimoto, 2012-07-26 02:18 review
win32_bytes_filename.patch vstinner, 2012-07-26 10:53 review
Messages (23)
msg166300 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-24 15:24
test_posixpath.PosixCommonTest.test_nonascii_abspath fails on Japanese edition
of Windows.

If a file name was invalid byte character, os.chdir() raises UnicodeDecodeError()
instead of WindowsError.

This is a byte-api issue on Windows, so we may be able to simply skip this test.

======================================================================
ERROR: test_nonascii_abspath (test.test_posixpath.PosixCommonTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\cygwin\home\ishimoto\src\cpython\lib\test\test_genericpath.py", line
318, in test_nonascii_abspath
    with support.temp_cwd(b'\xe7w\xf0'):
  File "C:\cygwin\home\ishimoto\src\cpython\lib\contextlib.py", line 48, in __en
ter__
    return next(self.gen)
  File "C:\cygwin\home\ishimoto\src\cpython\lib\test\support.py", line 614, in t
emp_cwd
    os.chdir(path)
UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping
for the Unicode character exists in the target code page.

----------------------------------------------------------------------
msg166302 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-24 15:51
changed name of test method. In the old patch, new test method shadowed 
existing one.
msg166350 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-25 02:37
I'm sorry, I generated a patch in wrong direction.
msg166362 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-25 08:03
The following change is a major change on how Python handles undecodable filenames on Windows:

-    return PyUnicode_DecodeMBCS(s, size, NULL);
+    return PyUnicode_DecodeMBCS(s, size, "surrogateescape");

I disagree with this change, Python should not generate surrogates *on Windows*.

By the way, there is also os.fsdecode(), it has the same behaviour than PyUnicode_DecodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() (it uses the "strict" error handler on Windows).
msg166365 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-25 08:17
> If a file name was invalid byte character, os.chdir() raises
> UnicodeDecodeError() instead of WindowsError.

I realized that the problem is in the error handling: raising the OSError fails with a UnicodeDecodeError because PyErr_SetFromWindowsErrWithFilename() calls PyUnicode_DecodeFSDefault(), whereas the filename is not decodable. If you want to change something, it should be PyErr_SetFromWindowsErrWithFilename(). We may use PyObject_Repr() or PyObject_ASCII() for example.

--

See also the issue #13374: "The Windows bytes API has been deprecated in the os module. Use Unicode filenames instead of bytes filenames to not depend on the ANSI code page anymore and to support any filename."
msg166366 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-25 08:33
Yes, I know #13374, that's why I wrote

> This is a byte-api issue on Windows, so we may be able to simply skip 
> this test.

Do you think we need a patch to avoid UnicodeDecodeError raised? 
Or should we change test to skip this?
msg166373 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-25 09:19
> Do you think we need a patch to avoid UnicodeDecodeError raised?
> Or should we change test to skip this?

It's a bug, the test should not be skipped. You should get an OSError
because the chdir() failed, not an UnicodeDecodeError.
msg166374 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 09:23
IMO, it is ok to skip the test on Windows; it was apparently targeted for Unix.

If we preserve it, we should pick a file name (on Windows) which is encodable in the file system encoding.
msg166375 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 09:25
As for your patch: you are missing the point of the test. The file name is assumed to be valid, despite it not being in the file system encoding.
msg166429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-25 22:01
decode_filename_mbcs.patch uses the "replace" error handler to decode
the filename on Windows. It should solve the issue.
msg166430 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-25 22:32
haypo: how is this meant to fix the bug? Won't it now cause a WindowsError, when a successful operation is expected?
msg166441 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-25 23:23
> haypo: how is this meant to fix the bug?
> Won't it now cause a WindowsError, when a successful
> operation is expected?

Oh, I was referring to the new test proposed in the attached patch (issue15441.patch):

+    def test_chdir_invalid_filename(self):
+        self.assertRaises(WindowsError, os.chdir, b'\xe7w\xf0')

os.chdir() in a non existent directory with a bytes name should raise an OSError, not a UnicodeDecodeError.

--

About the original issue: it looks like mkdir(bytes) decodes internally the directory name and ignore undecodable bytes. On Windows 7, mkdir(b"\xe7w\xf0") creates a directory called "\u8f42" (b"\xe7w", b"\xf0" suffix has been dropped). It is not possible to change the directory to "b"\xe7w\xf0", but it works with "b"\xe7w" or "\u8f42".

There are 2 issues:

 * On Windows, os.chdir(bytes) should not raise a UnicodeDecodeError on the directory does not exist
 * test_nonascii_abspath() can be skipped on Windows if os.fsdecode(b"\xe7w\xf0") fails, or b"\xe7w" name should be used instead

My patch is not the best solution because it looses information (if the filename contains undecodable bytes). I realized that OSError.filename is not necessary a str, bytes is also accepted. win32_error_object() can be used. The following patch pass the original bytes object to OSError constructor instead:


diff -r 43ae2a243eca Modules/posixmodule.c
--- a/Modules/posixmodule.c     Thu Jul 26 00:47:15 2012 +0200
+++ b/Modules/posixmodule.c     Thu Jul 26 01:19:14 2012 +0200
@@ -1138,11 +1138,10 @@ static PyObject *
 path_error(char *function_name, path_t *path)
 {
 #ifdef MS_WINDOWS
-    if (path->narrow)
-        return win32_error(function_name, path->narrow);
-    if (path->wide)
-        return win32_error_unicode(function_name, path->wide);
-    return win32_error(function_name, NULL);
+    return PyErr_SetExcFromWindowsErrWithFilenameObject(
+                PyExc_OSError,
+                0,
+                path->object);
 #else
     return path_posix_error(function_name, path);
 #endif

(sorry, I failed to attach a patch, I have an issue with my file chooser...)
msg166445 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-26 00:42
Here's another try:

In this patch:

- skip test_nonascii_abspath() test since it fails on some code pages.

- Added a test to reproduce bug on latin code pages.

- Use repr(filename) only if decode failed. This is more backward-compatible and does not lose any information.
msg166446 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-26 00:45
martin: while os.mkdir(b'\xe7w\xf0') succeeds, but strangely enough, subsequent os.chdir(b'\xe7w\xf0') fails.

This is not a bug in Python, but Windows issue.
msg166452 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-26 02:18
Updated patch again.

In this patch, byte object is passed as argument to WindowsError as Victor's patch.

I prefer to fix PyErr_SetFromWindowsErrWithFilename() over path_error(), because PyErr_SetFromWindowsErrWithFilename() is public API, so someone may use PyErr_SetFromWindowsErrWithFilename() for their own extension modules.
msg166471 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-26 10:53
+    @unittest.skipIf(sys.platform == 'win32',
+        "Win32 can fail cwd() with invalid utf8 name")
     def test_nonascii_abspath(self):

You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath():

name = b'\xe7w\xf0'
if sys.platform == 'win32':
  try:
    os.fsdecode(name)
  except UnicodeDecodeError:
    self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding()))

Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually.

+        batfile = """
+chcp 932
+{exe} {scriptname}
+chcp {codepage}
+"""

chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().

It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes.

Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows.

--

Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7.

--

Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_Set*WithFilename(..., char*) functions. My patch for  path_error() avoids the creation of a temporary bytes objets.

--

test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' :

-        # Issue 3426: check that abspath retuns unicode when the arg is unicode
-        # and str when it's str, with both ASCII and non-ASCII cwds
-        for cwd in (u'cwd', u'\xe7w\xf0'):

We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded.

--

Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.

I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).
msg166482 - (view) Author: Atsuo Ishimoto (ishimoto) * Date: 2012-07-26 14:18
> chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().

Sorry, I should have investigated the code more carefully.

> Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.

Looks good to me. Thank you!
msg166509 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-26 19:54
I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes. This issue is about test_nonascii_abspath failing on the Japanese edition of Windows (see the first sentence of the first message from Atsuo)

If you absolutely must fix the other issue right away also, it needs a test case.
msg166653 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-07-28 12:51
> I still would prefer if only one issue at a time gets fixed, in particular if the two issues require independent changes.

Sorry, you are right: I created the issue #15478 for the general fix, and we will use this issue to fix test_genericpath.

This issue can be fixed in Python 3.3, whereas #15478 will have to wait for Python 3.4 beause it's an major change and may break a lot of code. It's better to wait the next release to test such change correctly.
msg167157 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-08-01 18:07
New changeset 3edc71ed19e7 by Victor Stinner in branch 'default':
Issue #15441: Skip test_nonascii_abspath() of test_genericpath on Windows
http://hg.python.org/cpython/rev/3edc71ed19e7
msg167158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-08-01 18:10
This issue (the test) should be fixed, see the issue #15478 for the real fix.
msg174379 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-31 22:01
New changeset b3434c1ae503 by Victor Stinner in branch 'default':
Issue #15441, #15478: Reenable test_nonascii_abspath() on Windows
http://hg.python.org/cpython/rev/b3434c1ae503
msg174380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-31 22:03
> If a file name was invalid byte character,
> os.chdir() raises UnicodeDecodeError() instead of WindowsError.

I believe this case is not handled correctly in Python 3.4 (version under development) thanks to my work on issue #15478.

Thanks for the report Atsuo Ishimoto! Don't hesitate to report other similar issue ;-)
History
Date User Action Args
2022-04-11 14:57:33adminsetgithub: 59646
2012-10-31 22:03:47vstinnersetmessages: + msg174380
2012-10-31 22:01:34python-devsetmessages: + msg174379
2012-08-01 18:10:35vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg167158
2012-08-01 18:07:51python-devsetnosy: + python-dev
messages: + msg167157
2012-07-28 12:51:17vstinnersetmessages: + msg166653
2012-07-26 19:54:13loewissetmessages: + msg166509
2012-07-26 14:18:24ishimotosetmessages: + msg166482
2012-07-26 10:53:31vstinnersetfiles: + win32_bytes_filename.patch
nosy: + flox
messages: + msg166471

2012-07-26 02:18:25ishimotosetfiles: + issue15441_3.patch

messages: + msg166452
2012-07-26 00:45:50ishimotosetmessages: + msg166446
2012-07-26 00:42:57ishimotosetfiles: + issue15441_2.patch

messages: + msg166445
2012-07-25 23:23:37vstinnersetmessages: + msg166441
2012-07-25 22:32:47loewissetmessages: + msg166430
2012-07-25 22:01:17vstinnersetfiles: + decode_filename_mbcs.patch

messages: + msg166429
2012-07-25 09:25:26loewissetmessages: + msg166375
2012-07-25 09:23:58loewissetmessages: + msg166374
2012-07-25 09:19:50vstinnersetmessages: + msg166373
2012-07-25 08:33:55ishimotosetmessages: + msg166366
2012-07-25 08:17:34vstinnersetmessages: + msg166365
2012-07-25 08:03:17vstinnersetnosy: + loewis, tim.golden
messages: + msg166362
2012-07-25 02:37:44ishimotosetfiles: + issue15441.patch

messages: + msg166350
2012-07-25 02:33:26ishimotosetfiles: - test_nonascii_abspath_2.patch
2012-07-24 18:04:09pitrousetnosy: + vstinner

stage: patch review
2012-07-24 15:51:58ishimotosetfiles: - test_nonascii_abspath.patch
2012-07-24 15:51:46ishimotosetfiles: + test_nonascii_abspath_2.patch

messages: + msg166302
2012-07-24 15:24:07ishimotocreate