classification
Title: .pth files cannot contain folders with utf-8 names
Type: behavior Stage:
Components: Unicode, Windows Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: einaren, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2018-02-23 13:03 by einaren, last changed 2018-03-05 17:53 by steve.dower.

Messages (2)
msg312635 - (view) Author: Einar Fredriksen (einaren) Date: 2018-02-23 13:03
Add "G:\русский язык" to a pth file and start python. it fails with 

--------------
Failed to import the site module
Traceback (most recent call last):
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 546, in <module>
    main()
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 532, in main
    known_paths = addusersitepackages(known_paths)
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 287, in addusersitepackages
    addsitedir(user_site, known_paths)
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 209, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 165, in addpackage
    for n, line in enumerate(f):
  File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>
----------------

This might very well have sideeffects, but adding "encoding='utf-8'" to the open() call in site.py def addpackage seems to fix the issue for me
msg313273 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-05 17:53
Yes, it'll have significant side effects. The default file encoding on Windows is your configured code page (1252, in your case), and there's no good way around that default. The easiest immediate fix is to re-encode that file yourself.

Perhaps what we could do instead is allow the first line of a .pth file to be a coding comment? Then site.py can reopen the file with the specified encoding.

(FWIW, when I added the ._pth file, I explicitly made it UTF-8. But it had no history at that time so it was safe to do so.)
History
Date User Action Args
2018-03-05 17:53:23steve.dowersetmessages: + msg313273
2018-03-05 01:20:25r.david.murraysetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
components: + Windows
2018-02-23 13:03:18einarencreate