This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: shutil2.copy fails with destination filenames
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Adam.Matan, ezio.melotti
Priority: normal Keywords:

Created on 2011-04-02 14:25 by Adam.Matan, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg132799 - (view) Author: Adam Matan (Adam.Matan) Date: 2011-04-02 14:25
shutil.copy2(file, dest) fails when dest has unicode characters:

[2011-04-02 17:19:54 adam@adam-laptop ~/personal :) ]$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods')
>>> for file in files:
...     shutil.copy2(file, 'א') # This works, but:
...
>>> for file in files:
...     shutil.copy2(file, u'א')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.6/shutil.py", line 98, in copy2
    dst = os.path.join(dst, os.path.basename(src))
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: ordinal not in range(128)


See discussion here: http://stackoverflow.com/questions/5523373/python-how-to-move-a-file-with-unicode-filename-to-a-unicode-folder/5523385#5523385
msg132800 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-02 14:54
The problem here is that you are mixing byte strings and unicode.
glob.glob('*.ods') returns a list of strings, so in shutil.copy2(file, 'א') you are passing two byte strings and everything works fine.
In shutil.copy2(file, u'א') instead, file is a byte string, and u'א' is unicode, so the copy fails.
If you want to use u'א', you can pass u'*.ods' to glob.glob() in order to get a list of unicode strings from there too.
msg132801 - (view) Author: Adam Matan (Adam.Matan) Date: 2011-04-02 15:10
Don't you think that shutil should be able to handle mixed data types, for example byte string as file name and unicode destination directory? This is, in my opinion, a very common scenario.

Would you consider converting all arguments to Unicode?
msg132802 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-02 15:10
I should have added that both 'א' and u'א' work as long as the file names in the list returned by glob() are ASCII: if u'א' is used they are simply coerced to unicode.
If there are non-ASCII file names in the glob() list, the copy fails because Python 2 tries to coerce the name to unicode decoding it implicitly with the 'ascii' codec, and that fails.
msg132808 - (view) Author: Adam Matan (Adam.Matan) Date: 2011-04-02 20:19
Don't you think it should be changed in Python 2.x, so that the ASCII filename will be automatically converted to to Unicode?
msg132809 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-02 20:22
The ASCII filename is already converted to unicode, but in your case the program was most likely failing with some non-ASCII filename.
msg132810 - (view) Author: Adam Matan (Adam.Matan) Date: 2011-04-02 20:25
Do you think it should be fixed at the module level?
msg132812 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-02 20:34
Mixing byte and unicode strings should always be avoided, because the implicit coercion to unicode works only if the byte strings contains only ASCII, and fails otherwise.
Several modules -- including shutil, glob, and os.path -- have API that work with both byte and unicode strings, but fail when you mix the two:
>>> os.path.join('א', 'א')  # both byte strings -- works
'\xd7\x90/\xd7\x90'
>>> os.path.join(u'א', u'א')  # both unicode -- works
u'\u05d0/\u05d0'
>>> os.path.join('a', u'א')  # mixed, ASCII-only byte string -- works
u'a/\u05d0'

>>> os.path.join(u'א', 'א')  # mixed, non-ASCII -- fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: ordinal not in range(128)
>>> os.path.join('א', u'א')  # mixed, non-ASCII -- fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal not in range(128)
>>>
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55950
2011-04-02 20:34:00ezio.melottisetmessages: + msg132812
2011-04-02 20:25:36Adam.Matansetmessages: + msg132810
2011-04-02 20:22:26ezio.melottisetmessages: + msg132809
2011-04-02 20:19:27Adam.Matansetmessages: + msg132808
2011-04-02 15:10:32ezio.melottisetmessages: + msg132802
2011-04-02 15:10:22Adam.Matansetmessages: + msg132801
2011-04-02 14:54:13ezio.melottisetstatus: open -> closed
resolution: not a bug
messages: + msg132800

stage: test needed -> resolved
2011-04-02 14:29:18ezio.melottisetnosy: + ezio.melotti

type: behavior
components: + Library (Lib), - Extension Modules
stage: test needed
2011-04-02 14:25:53Adam.Matancreate