classification
Title: venv activate.bat is UTF-8 encoded but uses current console codepage
Type: behavior Stage: resolved
Components: Library (Lib), Unicode, Windows Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: Jac0, eryksun, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2017-12-22 10:36 by Jac0, last changed 2018-02-20 02:18 by steve.dower. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5757 merged steve.dower, 2018-02-19 18:11
PR 5765 merged miss-islington, 2018-02-20 01:25
PR 5765 merged miss-islington, 2018-02-20 01:25
PR 5766 merged miss-islington, 2018-02-20 01:26
Messages (6)
msg308931 - (view) Author: Jaakko Roponen (Jac0) Date: 2017-12-22 10:36
Let's say I have a folder c:\test-ä in Windows

Now if I run: py -m venv env
and activate: env\scripts\activate
and check: where python

the result is incorrectly just: C:\Users\Username\AppData\Local\Programs\Python\Python36\python.exe

If I run: path 
the result is: PATH=C:\test-ä\env\Scripts;...

So clearly the encoding is broken for the folder name.

I can fix this by changing activate.bat character encoding to OEM-US and then replacing "test-├ż" by "test-ä".

If I now activate and run: where python
the result is (as should be): 
C:\test-ä\env\Scripts\python.exe
C:\Users\Username\AppData\Local\Programs\Python\Python36\python.exe

By running: path
I get: PATH=C:\test-ä\env\Scripts;...

So looks good here as well.

I suggest that what ever is creating activate.bat file, is using incorrect character encoding for the creation of the file. If this is somehow platform specific, there could be a guide in the venv documentation about how to fix this.
msg308934 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-12-22 13:24
The CMD shell decodes batch scripts using the attached console's output codepage, which defaults to OEM. OTOH, venv writes the replacement values for the template activate.bat as UTF-8 (codepage 65001), which is correct and should not be downgraded to OEM. 

Instead, the batch script could temporarily switch the console to codepage 65001. Then restore the previous codepage at the end. For example:

    @echo off
    for /f "tokens=2 delims=:" %%a in ('"%SystemRoot%\System32\chcp.com"') do (
        set "CODEPAGE=%%a"
    )
    "%SystemRoot%\System32\chcp.com" 65001 > nul

[rest of script]

    "%SystemRoot%\System32\chcp.com" %CODEPAGE% > nul
    set "CODEPAGE="
    :END
msg312356 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-02-19 16:57
Eryk's solution seems to be best, so I'll add that.
msg312389 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-02-20 01:25
New changeset 6240917b773b52f8883387b9e3a5f327a4372068 by Steve Dower in branch 'master':
bpo-32409: Ensures activate.bat can handle Unicode contents (GH-5757)
https://github.com/python/cpython/commit/6240917b773b52f8883387b9e3a5f327a4372068
msg312390 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-02-20 01:45
New changeset a3d6c1b23b8a49b5003fdbd115d3598fe3d4c4bf by Steve Dower (Miss Islington (bot)) in branch '3.7':
bpo-32409: Ensures activate.bat can handle Unicode contents (GH-5765)
https://github.com/python/cpython/commit/a3d6c1b23b8a49b5003fdbd115d3598fe3d4c4bf
msg312393 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-02-20 02:18
New changeset 8e149ff481acbb3889c825b8bf7b10aa191f09a7 by Steve Dower (Miss Islington (bot)) in branch '3.6':
bpo-32409: Ensures activate.bat can handle Unicode contents (GH-5766)
https://github.com/python/cpython/commit/8e149ff481acbb3889c825b8bf7b10aa191f09a7
History
Date User Action Args
2018-02-20 02:18:45steve.dowersetmessages: + msg312393
2018-02-20 01:45:58steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-02-20 01:45:06steve.dowersetmessages: + msg312390
2018-02-20 01:26:32miss-islingtonsetpull_requests: + pull_request5545
2018-02-20 01:25:42miss-islingtonsetpull_requests: + pull_request5544
2018-02-20 01:25:40miss-islingtonsetpull_requests: + pull_request5543
2018-02-20 01:25:29steve.dowersetmessages: + msg312389
2018-02-19 18:11:12steve.dowersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request5535
2018-02-19 16:57:10steve.dowersetassignee: steve.dower
messages: + msg312356
versions: + Python 3.8
2017-12-22 13:24:49eryksunsettitle: venv activation doesn't work, if project is in a Windows folder that has latin-1 supplement characters (such as ä,ö,å) in its path -> venv activate.bat is UTF-8 encoded but uses current console codepage
components: + Library (Lib), Unicode, Windows, - Extension Modules

nosy: + ezio.melotti, eryksun, paul.moore, tim.golden, vstinner, zach.ware, steve.dower
versions: + Python 3.7
messages: + msg308934
stage: needs patch
2017-12-22 10:36:15Jac0create