os.path.expanduser breaks when using unicode character in the username #57416

mandel · 2011-10-18T11:34:29Z

BPO	13207
Nosy	@vstinner, @ezio-melotti, @merwok, @florentx
Files	expanduser.py: Example of using the win api to expand the user.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-10-26.22:50:03.427>
created_at = <Date 2011-10-18.11:34:28.554>
labels = ['type-bug', 'OS-windows']
title = 'os.path.expanduser breaks when using unicode character in the username'
updated_at = <Date 2016-04-09.00:50:26.637>
user = 'https://bugs.python.org/mandel'

bugs.python.org fields:

activity = <Date 2016-04-09.00:50:26.637>
actor = 'Arkady \xe2\x80\x9cKindDragon\xe2\x80\x9d Shapkin'
assignee = 'none'
closed = True
closed_date = <Date 2011-10-26.22:50:03.427>
closer = 'vstinner'
components = ['Windows']
creation = <Date 2011-10-18.11:34:28.554>
creator = 'mandel'
dependencies = []
files = ['23442']
hgrepos = []
issue_num = 13207
keywords = []
message_count = 5.0
messages = ['145798', '145819', '146450', '146460', '263052']
nosy_count = 7.0
nosy_names = ['vstinner', 'ezio.melotti', 'eric.araujo', 'flox', 'santoso.wijaya', 'mandel', 'Arkady \xe2\x80\x9cKindDragon\xe2\x80\x9d Shapkin']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue13207'
versions = ['Python 2.7']

mandel · 2011-10-18T11:34:28Z

During our development we have experience the following:

If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:

The Windows Shell will show the path correctly, that is: “C:\Users\雄鳥お人好し”
cmd.exe will show: “C:\Users\??????”
All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe

The above is a problem because the implementation of expanduser in ntpath.py uses the env variables to get expand the path which means that in this case the returned path will be wrong.

I have attached a small example of how to get the user profile path (~) on Windows using SHGetFolderPathW or SHGetKnownFolderPathW to fix the issue.

PS: I don't know if this issue also occurs on python 3.

merwok · 2011-10-18T15:53:36Z

On POSIX, Python 3 works correctly if my home dir is /tmp/éric, and Python 2.7 returns a UTF-8-encoded (not locale-encoded!) bytes string.

For Windows, a patch would probably need to add a private function to the _nt module (in C): ctypes is too dangerous to be used in the standard library.

santosowijaya · 2011-10-26T18:32:52Z

Unicode environment vars work properly in Python 3.x on Windows, too, because the convertenviron() function in posixmodule.c uses extern _wenviron PyUnicode_FromWideChar() in Python 3.x. In Python 2.7, convertenviron() uses extern environ and PyString_FromString*().

vstinner · 2011-10-26T22:50:03Z

Python 2 uses byte strings. If characters are not encodable to the ANSI code page, Windows replaces them by question marks. See the issue bpo-13247 for another example (in Python 3 when using explicitly the bytes API). To be able to support characters not encodable to the ANSI code page, you have to use Unicode *everywhere*.

Because Python 2 doesn't have access to the Unicode environment and uses bytes in most cases, I don't think that we can fix this issue in Python 2.

I close this issue because it would require too much work to fix this issue in Python 2, whereas it already works in Python 3. Move to Python 3 is the best solution of this issue.

ArkadyKindDragonShapkin · 2016-04-09T00:50:27Z

At least Python 2.7 should return in locale.getpreferredencoding() encoding

mandel mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Oct 18, 2011

florentx mannequin changed the title ~~os.path.expanduser brakes when using unicode character in the username~~ os.path.expanduser breaks when using unicode character in the username Oct 18, 2011

vstinner closed this as completed Oct 26, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os.path.expanduser breaks when using unicode character in the username #57416

os.path.expanduser breaks when using unicode character in the username #57416

mandel mannequin commented Oct 18, 2011

mandel mannequin commented Oct 18, 2011

merwok commented Oct 18, 2011

santosowijaya mannequin commented Oct 26, 2011

vstinner commented Oct 26, 2011

ArkadyKindDragonShapkin mannequin commented Apr 9, 2016

os.path.expanduser breaks when using unicode character in the username #57416

os.path.expanduser breaks when using unicode character in the username #57416

Comments

mandel mannequin commented Oct 18, 2011

mandel mannequin commented Oct 18, 2011

merwok commented Oct 18, 2011

santosowijaya mannequin commented Oct 26, 2011

vstinner commented Oct 26, 2011

ArkadyKindDragonShapkin mannequin commented Apr 9, 2016