Message 166471 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, flox, ishimoto, loewis, tim.golden, vstinner
Date	2012-07-26.10:53:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1343300011.87.0.0624148929911.issue15441@psf.upfronthosting.co.za>
In-reply-to

Content
+ @unittest.skipIf(sys.platform == 'win32', + "Win32 can fail cwd() with invalid utf8 name") def test_nonascii_abspath(self): You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath(): name = b'\xe7w\xf0' if sys.platform == 'win32': try: os.fsdecode(name) except UnicodeDecodeError: self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding())) Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually. + batfile = """ +chcp 932 +{exe} {scriptname} +chcp {codepage} +""" chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding(). It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes. Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows. -- Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7. -- Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_SetWithFilename(..., char) functions. My patch for path_error() avoids the creation of a temporary bytes objets. -- test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' : - # Issue 3426: check that abspath retuns unicode when the arg is unicode - # and str when it's str, with both ASCII and non-ASCII cwds - for cwd in (u'cwd', u'\xe7w\xf0'): We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded. -- Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError. I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).

+    @unittest.skipIf(sys.platform == 'win32',
+        "Win32 can fail cwd() with invalid utf8 name")
     def test_nonascii_abspath(self):

You should not always skip the test on Windows: the filename is decodable in code pages other than cp932. It would be better to add the following code at the beginning of test_nonascii_abspath():

name = b'\xe7w\xf0'
if sys.platform == 'win32':
  try:
    os.fsdecode(name)
  except UnicodeDecodeError:
    self.skipTest("the filename %a is not decodable from the ANSI code page (%s)" % (name, sys.getfilesystemencoding()))

Note: Windows does not use UTF-8 for ANSI or OEM code pages, except if you change it manually.

+        batfile = """
+chcp 932
+{exe} {scriptname}
+chcp {codepage}
+"""

chcp does only change the OEM code page, whereas Python uses the ANSI code page for sys.getfilesystemencoding().

It is possible to change the ANSI code page of the current thread (CP_THREAD_ACP) using SetThreadLocale(), but it doesn't help because Python uses the global ANSI code page (CP_ACP). I don't think that changing the CP_THREAD_ACP code page does change the CP_ACP code page of child processes.

Changing the ANSI code page manually is possible in the Control Panel, but it requires to reboot Windows.

--

Your patch expects that "os.mkdir(b'\xe7w\xf0'); os.chdir(b'\xe7w\xf0')" works whereas I tested manually in Python, and it doesn't work because Windows creates a directory called "\u8f42" (b'\xe7w'), see my previous message (msg166441). At least with a NTFS filesystem on Windows 7.

--

Your last patch tries to decode the bytes filename from the filesystem encoding, or uses repr(filename). I may be better to keep the bytes filenames unchanged in OSError.filename, instead of using repr(). But it sounds like a good idea to patch all PyErr_Set*WithFilename(..., char*) functions. My patch for  path_error() avoids the creation of a temporary bytes objets.

--

test_support.temp_cwd(b'\xe7w\xf0') test was added by the changeset ebdc2aa730c0 and is related to the issue #3426. I'm not sure that it was really expected to test b'\xe7w\xf0', because a previous test was using u'\xe7w\xf0' :

-        # Issue 3426: check that abspath retuns unicode when the arg is unicode
-        # and str when it's str, with both ASCII and non-ASCII cwds
-        for cwd in (u'cwd', u'\xe7w\xf0'):

We may use b'\xe7w' instead of b'\xe7w\xf0' if b'\xe7w\xf0' cannot be decoded.

--

Attached patch win32_bytes_filename.patch tries to solve both issues: the test and UnicodeDecodeError on raising the OSError.

I tries to decode the bytes filename from the FS encoding, or keeps it unchanged (as bytes). As Python 2 does with os.listdir(unicode).

History
Date	User	Action	Args
2012-07-26 10:53:31	vstinner	set	recipients: + vstinner, loewis, ishimoto, tim.golden, ezio.melotti, flox
2012-07-26 10:53:31	vstinner	set	messageid: <1343300011.87.0.0624148929911.issue15441@psf.upfronthosting.co.za>
2012-07-26 10:53:31	vstinner	link	issue15441 messages
2012-07-26 10:53:30	vstinner	create