classification
Title: os.path.expanduser breaks when using unicode character in the username
Type: behavior Stage:
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Arkady “KindDragon” Shapkin, eric.araujo, ezio.melotti, flox, mandel, santoso.wijaya, vstinner
Priority: normal Keywords:

Created on 2011-10-18 11:34 by mandel, last changed 2016-04-09 00:50 by Arkady “KindDragon” Shapkin. This issue is now closed.

Files
File name Uploaded Description Edit
expanduser.py mandel, 2011-10-18 11:34 Example of using the win api to expand the user.
Messages (5)
msg145798 - (view) Author: Manuel de la Pena (mandel) Date: 2011-10-18 11:34
During our development we have experience the following:

If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:

* The Windows Shell will show the path correctly, that is: “C:\Users\雄鳥お人好し”
* cmd.exe will show: “C:\Users\??????”
* All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe

The above is a problem because the implementation of expanduser in ntpath.py uses the env variables to get expand the path which means that in this case the returned path will be wrong. 

I have attached a small example of how to get the user profile path (~) on Windows using SHGetFolderPathW or SHGetKnownFolderPathW to fix the issue. 

PS: I don't know if this issue also occurs on python 3.
msg145819 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-18 15:53
On POSIX, Python 3 works correctly if my home dir is /tmp/éric, and Python 2.7 returns a UTF-8-encoded (not locale-encoded!) bytes string.

For Windows, a patch would probably need to add a private function to the _nt module (in C): ctypes is too dangerous to be used in the standard library.
msg146450 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-10-26 18:32
Unicode environment vars work properly in Python 3.x on Windows, too, because the convertenviron() function in posixmodule.c uses extern _wenviron PyUnicode_FromWideChar() in Python 3.x. In Python 2.7, convertenviron() uses extern environ and PyString_FromString*().
msg146460 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-26 22:50
Python 2 uses byte strings. If characters are not encodable to the ANSI code page, Windows replaces them by question marks. See the issue #13247 for another example (in Python 3 when using explicitly the bytes API). To be able to support characters not encodable to the ANSI code page, you have to use Unicode *everywhere*.

Because Python 2 doesn't have access to the Unicode environment and uses bytes in most cases, I don't think that we can fix this issue in Python 2.

I close this issue because it would require too much work to fix this issue in Python 2, whereas it already works in Python 3.  Move to Python 3 is the best solution of this issue.
msg263052 - (view) Author: Arkady “KindDragon” Shapkin (Arkady “KindDragon” Shapkin) Date: 2016-04-09 00:50
At least Python 2.7 should return in locale.getpreferredencoding() encoding
History
Date User Action Args
2016-04-09 00:50:26Arkady “KindDragon” Shapkinsetnosy: + Arkady “KindDragon” Shapkin
messages: + msg263052
2011-10-26 22:50:03vstinnersetstatus: open -> closed
resolution: wont fix
messages: + msg146460
2011-10-26 18:32:52santoso.wijayasetnosy: + santoso.wijaya
messages: + msg146450
2011-10-26 04:30:43ezio.melottisetnosy: + ezio.melotti
2011-10-18 15:53:36eric.araujosetnosy: + vstinner, eric.araujo
messages: + msg145819
2011-10-18 12:16:31floxsetnosy: + flox

title: os.path.expanduser brakes when using unicode character in the username -> os.path.expanduser breaks when using unicode character in the username
2011-10-18 11:34:28mandelcreate