This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.expanduser does not use the system encoding
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: plgarcia, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2013-06-09 10:51 by plgarcia, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg190850 - (view) Author: Pascal Garcia (plgarcia) Date: 2013-06-09 10:51
The name of the user contains accents under windows.

This error occurs when using the function. expaduser("~")

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 10: ordinal not in range(128)

ascii is the default encoding as sys.getdefaultencoding()
If in site.py "I enable Enable the support locale" then de defaultencoding become cp1252 and the function works.

Expand user should use the encoding used by the system (may be locale.getdefaultlocale()) to decode path given by the system instead of the default encoding the should be the target encoding.

I do beleave some other functions may be concerned by this problem.
I detect the problem on Windows (WP and 7), but I do beleave the problem may happen on Linux also.
msg190853 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-09 13:47
I could not reproduce this error on Linux with python2.7.
msg190854 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-09 13:48
Also, it would be helpful for you to show a full traceback, since there can be spurrious sources of unicode errors on Windows depending on how you execute your code.
msg190858 - (view) Author: Pascal Garcia (plgarcia) Date: 2013-06-09 15:34
Here are 2 logs one with the default site.py forcing defaultencoding to ascii, and the other to utf8.
You can see that the home dir includes accents : Pépé Not an insult to anybody but this stupid computer :)

When I force using the locale.getdefaultlocale() as encoding then the function works, but, after having called expanduser, I need to make an explicit decode(locale.getdefaultlocale()), or else the string can not be used to build path to files.

==> with ASCII

C:\Users\pépé>D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py
Traceback (most recent call last):
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py", line 329, in <module>
    run()
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py", line 206, in run
    globaldef.initDefs(wd, force)
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\globaldef.py", line 80, in initDefs
    wrkdir= os.path.expanduser(u"~"+os.sep)
  File "C:\Python27\lib\ntpath.py", line 301, in expanduser
    return userhome + path[i:]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 10: ordinal not in range(128)

WITH UTF8 :
C:\Users\pépé>D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py
Traceback (most recent call last):
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py", line 329, in <module>
    run()
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\scolasync.py", line 206, in run
    globaldef.initDefs(wd, force)
  File "D:\DevelopmentWorkspaces\SCOLASYNC\ScolaSyncNG\scolasync-ng\src\globaldef.py", line 80, in initDefs
    wrkdir= os.path.expanduser(u"~"+os.sep)
  File "C:\Python27\lib\ntpath.py", line 301, in expanduser
    return userhome + path[i:]
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte
msg190861 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-09 16:05
On linux as well this fails:

  os.path.expanduser(u'~' + os.sep)

But this works:

  os.path.expanduser('~' + os.sep)

Counterintuitive, to say the least.  The reason is that the value of the HOME environment variable is read as a byte string, but when that byte string value is added to the unicode u'~/', unicode coercion attempts to decode the byte string as an ASCII string, which fails.

So, you must manipulate paths as byte strings in python2, decoding them yourself with the appropriate codec if needed.

This stuff is handled automatically in Python3, using the default encoding as you suggest (plus the surrogateescape error handler to handle unknown bytes on linux/unix).  Fixes for stuff like this is a large part of the purpose of Python3.  

So, in Python2 this is working as expected.
msg190863 - (view) Author: Pascal Garcia (plgarcia) Date: 2013-06-09 16:36
Sorry for this error.
Thanks for the solution.

Here is the code as I modify it.
            wrkdir= os.path.expanduser("~"+os.sep)
            loc = locale.getdefaultlocale()
            if loc[1]:
                encoding = loc[1]
                wrkdir= wrkdir.decode(encoding)

I need to explicitally decode the string if I want to use it and have the next sentence working a bit further.
            os.path.join(wrkdir, u"Tango\\")

Encodding is a very good motivation to go to python3, and if i didn't have other constraints it would be done for ages.

For this special case I think that function should return strings with the default encoding, and the programmer should not have to know about the underground to make the right decode.

But it works, thanks again.
Pascal
History
Date User Action Args
2022-04-11 14:57:46adminsetgithub: 62371
2013-06-09 16:45:15r.david.murraysetstatus: open -> closed
type: behavior
resolution: not a bug
2013-06-09 16:36:56plgarciasetstatus: closed -> open
type: behavior -> (no value)
resolution: not a bug -> (no value)
messages: + msg190863
2013-06-09 16:05:47r.david.murraysetstatus: open -> closed
type: behavior
messages: + msg190861

resolution: not a bug
stage: resolved
2013-06-09 15:34:32plgarciasetmessages: + msg190858
2013-06-09 13:48:18r.david.murraysetmessages: + msg190854
2013-06-09 13:47:19r.david.murraysetnosy: + vstinner, r.david.murray
messages: + msg190853
2013-06-09 10:51:10plgarciacreate