This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, flox, ishimoto, loewis, tim.golden, vstinner
Date 2012-07-26.10:53:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1343300011.87.0.0624148929911.issue15441@psf.upfronthosting.co.za>
In-reply-to
Content
+    @unittest.skipIf(sys.platform == 'win32',
+        "Win32 can fail cwd() with invalid utf8 name")
     def test_nonascii_abspath(self):

You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath():

name = b'\xe7w\xf0'
if sys.platform == 'win32':
  try:
    os.fsdecode(name)
  except UnicodeDecodeError:
    self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding()))

Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually.

+        batfile = """
+chcp 932
+{exe} {scriptname}
+chcp {codepage}
+"""

chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().

It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes.

Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows.

--

Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7.

--

Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_Set*WithFilename(..., char*) functions. My patch for  path_error() avoids the creation of a temporary bytes objets.

--

test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' :

-        # Issue 3426: check that abspath retuns unicode when the arg is unicode
-        # and str when it's str, with both ASCII and non-ASCII cwds
-        for cwd in (u'cwd', u'\xe7w\xf0'):

We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded.

--

Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.

I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).
History
Date User Action Args
2012-07-26 10:53:31vstinnersetrecipients: + vstinner, loewis, ishimoto, tim.golden, ezio.melotti, flox
2012-07-26 10:53:31vstinnersetmessageid: <1343300011.87.0.0624148929911.issue15441@psf.upfronthosting.co.za>
2012-07-26 10:53:31vstinnerlinkissue15441 messages
2012-07-26 10:53:30vstinnercreate