classification
Title: test_listdir is failing on ubuntu with WSL
Type: Stage:
Components: Tests Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Matthias Braun, eryksun, steve.dower
Priority: normal Keywords:

Created on 2019-10-12 05:06 by BTaskaya, last changed 2020-03-17 02:41 by Matthias Braun.

Messages (3)
msg354518 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2019-10-12 05:06
Run tests sequentially
0:00:00 load avg: 0.52 [1/1] test_os
test test_os failed -- Traceback (most recent call last):
  File "/home/isidentical/cpython/Lib/test/test_os.py", line 2059, in test_listdir
    self.assertEqual(found, expected)
AssertionError: Items in the first set but not the second:
'@test_12966_tmp-�'
Items in the second set but not the first:
'@test_12966_tmp-\udcff'

test_os failed

== Tests result: FAILURE ==

1 test failed:
    test_os

Total duration: 2 sec 587 ms
Tests result: FAILURE


System:
 - Ubuntu 18.04 bionic
 - x86_64 Linux 4.4.0-18362-Microsoft
msg354550 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-10-12 19:22
The test assumes that Unix filesystems store names as arbitrary sequences of bytes, with only ASCII slash and null reserved. Windows NTFS stores names as arbitrary sequences of 16-bit words, with many reserved ASCII characters including \/:*?<>"| and control characters 0x00-0x1F. WSL implements a UTF-8 filesystem encoding over this by transcoding bytes from UTF-8 to UTF-16LE and escaping reserved characters (excepting slash and null) as sequences that begin with "#" (e.g. "<#" -> "#003C#0023"). The latter is only visible from Windows in the distro's "LocalState\rootfs" tree.

This scheme fails for TESTFN_UNDECODABLE. Bytes that can't be transcoded to UTF-16LE are replaced by the replacement character U+FFFD. For example:

    >>> n = b'\xff'
    >>> open(n, 'w').close()
    >>> os.listdir(b'.')
    [b'\xef\xbf\xbd']
    >>> hex(ord(os.listdir('.')[0]))
    '0xfffd'

WSL could address this by abandoning their current "#" escaping approach to instead translate all reserved and undecodable bytes to the U+DC00-U+DCFF surrogate range, like Python's "surrogateescape" error handler. The Windows API could even support this with a new flag for MultiByteToWideChar and WideCharToMultiByte.
msg364385 - (view) Author: Matthias Braun (Matthias Braun) * Date: 2020-03-17 02:41
I believe my suggested pull request in https://bugs.python.org/issue39986 may solve this issue as a side effect because we no longer list the root directory but a temporary directory with controlled filenames.
History
Date User Action Args
2020-03-17 02:41:17Matthias Braunsetnosy: + Matthias Braun
messages: + msg364385
2019-10-12 19:22:20eryksunsetnosy: + eryksun
messages: + msg354550
2019-10-12 05:06:32BTaskayacreate