Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.expanduser breaks when using unicode character in the username #57416

Closed
mandel mannequin opened this issue Oct 18, 2011 · 5 comments
Closed

os.path.expanduser breaks when using unicode character in the username #57416

mandel mannequin opened this issue Oct 18, 2011 · 5 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@mandel
Copy link
Mannequin

mandel mannequin commented Oct 18, 2011

BPO 13207
Nosy @vstinner, @ezio-melotti, @merwok, @florentx
Files
  • expanduser.py: Example of using the win api to expand the user.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-10-26.22:50:03.427>
    created_at = <Date 2011-10-18.11:34:28.554>
    labels = ['type-bug', 'OS-windows']
    title = 'os.path.expanduser breaks when using unicode character in the username'
    updated_at = <Date 2016-04-09.00:50:26.637>
    user = 'https://bugs.python.org/mandel'

    bugs.python.org fields:

    activity = <Date 2016-04-09.00:50:26.637>
    actor = 'Arkady \xe2\x80\x9cKindDragon\xe2\x80\x9d Shapkin'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-10-26.22:50:03.427>
    closer = 'vstinner'
    components = ['Windows']
    creation = <Date 2011-10-18.11:34:28.554>
    creator = 'mandel'
    dependencies = []
    files = ['23442']
    hgrepos = []
    issue_num = 13207
    keywords = []
    message_count = 5.0
    messages = ['145798', '145819', '146450', '146460', '263052']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'ezio.melotti', 'eric.araujo', 'flox', 'santoso.wijaya', 'mandel', 'Arkady \xe2\x80\x9cKindDragon\xe2\x80\x9d Shapkin']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13207'
    versions = ['Python 2.7']

    @mandel
    Copy link
    Mannequin Author

    mandel mannequin commented Oct 18, 2011

    During our development we have experience the following:

    If you have a user in your Windows machine with a name hat uses Japanese characters like “雄鳥お人好し” you will have the following in your system:

    • The Windows Shell will show the path correctly, that is: “C:\Users\雄鳥お人好し”
    • cmd.exe will show: “C:\Users\??????”
    • All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe

    The above is a problem because the implementation of expanduser in ntpath.py uses the env variables to get expand the path which means that in this case the returned path will be wrong.

    I have attached a small example of how to get the user profile path (~) on Windows using SHGetFolderPathW or SHGetKnownFolderPathW to fix the issue.

    PS: I don't know if this issue also occurs on python 3.

    @mandel mandel mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Oct 18, 2011
    @florentx florentx mannequin changed the title os.path.expanduser brakes when using unicode character in the username os.path.expanduser breaks when using unicode character in the username Oct 18, 2011
    @merwok
    Copy link
    Member

    merwok commented Oct 18, 2011

    On POSIX, Python 3 works correctly if my home dir is /tmp/éric, and Python 2.7 returns a UTF-8-encoded (not locale-encoded!) bytes string.

    For Windows, a patch would probably need to add a private function to the _nt module (in C): ctypes is too dangerous to be used in the standard library.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Oct 26, 2011

    Unicode environment vars work properly in Python 3.x on Windows, too, because the convertenviron() function in posixmodule.c uses extern _wenviron PyUnicode_FromWideChar() in Python 3.x. In Python 2.7, convertenviron() uses extern environ and PyString_FromString*().

    @vstinner
    Copy link
    Member

    Python 2 uses byte strings. If characters are not encodable to the ANSI code page, Windows replaces them by question marks. See the issue bpo-13247 for another example (in Python 3 when using explicitly the bytes API). To be able to support characters not encodable to the ANSI code page, you have to use Unicode *everywhere*.

    Because Python 2 doesn't have access to the Unicode environment and uses bytes in most cases, I don't think that we can fix this issue in Python 2.

    I close this issue because it would require too much work to fix this issue in Python 2, whereas it already works in Python 3. Move to Python 3 is the best solution of this issue.

    @ArkadyKindDragonShapkin
    Copy link
    Mannequin

    ArkadyKindDragonShapkin mannequin commented Apr 9, 2016

    At least Python 2.7 should return in locale.getpreferredencoding() encoding

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants