New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.path.expanduser breaks when using unicode character in the username #57416
Comments
During our development we have experience the following: If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:
The above is a problem because the implementation of expanduser in ntpath.py uses the env variables to get expand the path which means that in this case the returned path will be wrong. I have attached a small example of how to get the user profile path (~) on Windows using SHGetFolderPathW or SHGetKnownFolderPathW to fix the issue. PS: I don't know if this issue also occurs on python 3. |
On POSIX, Python 3 works correctly if my home dir is /tmp/éric, and Python 2.7 returns a UTF-8-encoded (not locale-encoded!) bytes string. For Windows, a patch would probably need to add a private function to the _nt module (in C): ctypes is too dangerous to be used in the standard library. |
Unicode environment vars work properly in Python 3.x on Windows, too, because the convertenviron() function in posixmodule.c uses extern _wenviron PyUnicode_FromWideChar() in Python 3.x. In Python 2.7, convertenviron() uses extern environ and PyString_FromString*(). |
Python 2 uses byte strings. If characters are not encodable to the ANSI code page, Windows replaces them by question marks. See the issue bpo-13247 for another example (in Python 3 when using explicitly the bytes API). To be able to support characters not encodable to the ANSI code page, you have to use Unicode *everywhere*. Because Python 2 doesn't have access to the Unicode environment and uses bytes in most cases, I don't think that we can fix this issue in Python 2. I close this issue because it would require too much work to fix this issue in Python 2, whereas it already works in Python 3. Move to Python 3 is the best solution of this issue. |
At least Python 2.7 should return in locale.getpreferredencoding() encoding |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: