classification
Title: shutil.rmtree fails on non ascii filenames
Type: Stage: resolved
Components: Library (Lib), Windows Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Steffen Kampmann, jaraco, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2015-07-20 08:40 by Steffen Kampmann, last changed 2016-12-21 19:48 by jaraco. This issue is now closed.

Messages (9)
msg246971 - (view) Author: Steffen Kampmann (Steffen Kampmann) Date: 2015-07-20 08:40
I run python 2.7 on Windows 7 and the function rmtree of the shutil package fails to remove files with a non ascii filename:

    File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
    File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
    File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 247, in rmtree    rmtree(fullname, ignore_errors, onerror)
    File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 252, in rmtree    onerror(os.remove, fullname, sys.exc_info())
    File "C:\Users\skampmann\AppData\Local\Continuum\Anaconda\lib\shutil.py", line 250, in rmtree    os.remove(fullname)
  WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'H:\\ihre_perso\xa6\xeanlichen_Zugangsdaten600.jpg'

Please let me know if i can help with something.
msg246973 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2015-07-20 08:48
Can you confirm whether it also fails if you pass in a unicode string? eg

shutil.rmtree(u"filename.txt")
msg271661 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-07-30 03:46
I've confirmed the issue. It does indeed only occur if the string passed to rmtree is bytes. I discovered this during my investigation of https://github.com/cherrypy/cherrypy/issues/1467. The following script will replicate the failure on Windows systems on Python 2 and Python 3, but not on other operating systems:

---
# encoding: utf-8

from __future__ import unicode_literals

import os
import shutil

os.mkdir('temp')

with open('temp/Слава Україні.html', 'w'):
    pass

print(os.listdir(b'temp')[0])

shutil.rmtree(b'temp')
---

The error on Python 2.7 is this:

????? ???????.html
Traceback (most recent call last):
  File "C:\Users\jaraco\p\cherrypy\issue-1467.py", line 15, in <module>
    shutil.rmtree(b'temp')
  File "C:\Program Files\Python27\lib\shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "C:\Program Files\Python27\lib\shutil.py", line 250, in rmtree
    os.remove(fullname)
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'temp\\????? ???????.html'

This issue might be related to issue25911 or issue24230 or issue18713 or issue16656 or issue9820 and probably others.


It's not obvious to me browsing through those tickets why Windows should behave differently when a bytestring is passed to listdir. Perhaps I'll delve into those tickets in more depth.
msg271664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-30 04:55
See also issue16700.

On Windows there are two sets of API: Unicode and bytes. File names are stored in Unicode (UTF-16) in modern filesystems and encoded to bytes by system for bytes API. Unfortunately this encoding is lossfull. Windows try to find the closest equivalent if the character is not encodable with current codepage (for example drops diacritics) and silently replaces it with "?" if can't find anything appropriate. We can't do anything with this from Python side except using Unicode API.
msg271666 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-30 06:22
Use Unicode on Python 3, it will work on all platforms. Problem solved :-)
msg271699 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-07-30 17:09
I agree. I was able to apply a fairly simple fix to setuptools to address the failure (https://github.com/pypa/setuptools/commit/857949575022946cc60c7cd1d0d088246d3f7540).

I suggest closing this ticket as won't fix.
msg283707 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-12-20 19:14
I'm afraid I need to re-open this issue.

Although passing unicode names to rmtree fixes the issue on Windows systems, it causes problems on Linux systems where LC_ALL=C. Consider this script:

#################################
# encoding: utf-8

from __future__ import unicode_literals

import os
import shutil

os.mkdir('temp')

with open('temp/Слава Україні.html'.encode('utf-8'), 'w'):
    pass

print(os.listdir(b'temp')[0])

shutil.rmtree('temp')
#################################

Invoked thus, a UnicodeDecodeError occurs:

vagrant@trusty:/vagrant$ LC_ALL=C python2.7 issue24672.py 
Слава Україні.html
Traceback (most recent call last):
  File "issue24672.py", line 15, in <module>
    shutil.rmtree('temp')
  File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)


This is the same error seen trying to rmtree an extraction of Sphinx (a package containing an offending non-ascii character)::

vagrant@trusty:/vagrant$ wget 'https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz' -O - | tar xz 
--2016-12-20 19:07:21--  https://files.pythonhosted.org/packages/b2/d5/bb4bf7fbc2e6b85d1e3832716546ecd434632d9d434a01efe87053fe5f25/Sphinx-1.5.1.tar.gz
Resolving files.pythonhosted.org (files.pythonhosted.org)... 151.101.33.63
Connecting to files.pythonhosted.org (files.pythonhosted.org)|151.101.33.63|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4397246 (4.2M) [binary/octet-stream]
Saving to: ‘STDOUT’

100%[========================================================>] 4,397,246   2.06MB/s   in 2.0s   

2016-12-20 19:07:23 (2.06 MB/s) - written to stdout [4397246/4397246]

vagrant@trusty:/vagrant$ LC_ALL=C python2.7 -c "import shutil; shutil.rmtree(u'Sphinx-1.5.1')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 8: ordinal not in range(128)


Is the solution to call rmtree with unicode on Windows, but with bytes when on Python 2 and Linux? What else can be done?
msg283710 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-12-20 20:39
Lib/posixpath.py needs a huge amount of work to behave correctly for either bytes or Unicode paths. I don't know why Lib/ntpath.py is okay here, but the code is different so I suspect it just happens to not need the same conversion.

Switching for each platform is probably the only way, unless you find someone willing to go through and make Unicode paths viable on Python 2.7 (this came up earlier today on one of the lists).
msg283776 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-12-21 19:48
In https://github.com/pypa/setuptools/issues/706, I've addressed this additional concern.
History
Date User Action Args
2016-12-21 19:48:45jaracosetstatus: open -> closed
resolution: wont fix
messages: + msg283776
2016-12-20 20:39:54steve.dowersetmessages: + msg283710
2016-12-20 19:14:03jaracosetstatus: closed -> open
resolution: wont fix -> (no value)
messages: + msg283707
2016-07-30 19:46:23r.david.murraysetstatus: open -> closed
stage: resolved
resolution: wont fix
versions: - Python 3.5, Python 3.6
2016-07-30 17:09:15jaracosetmessages: + msg271699
2016-07-30 06:22:00vstinnersetmessages: + msg271666
2016-07-30 04:55:52serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg271664
2016-07-30 03:46:24jaracosetnosy: + jaraco
title: shutil.rmtree failes on non ascii filenames -> shutil.rmtree fails on non ascii filenames
messages: + msg271661

versions: + Python 3.5, Python 3.6
2015-07-20 08:48:48tim.goldensetmessages: + msg246973
2015-07-20 08:47:06serhiy.storchakasetnosy: + paul.moore, vstinner, tim.golden, zach.ware, steve.dower
components: + Windows
2015-07-20 08:40:07Steffen Kampmanncreate