This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows
Type: crash Stage: resolved
Components: Windows Versions: Python 3.9
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: HaujetZhao, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2020-11-24 16:26 by HaujetZhao, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg381753 - (view) Author: 赵豪杰 (HaujetZhao) * Date: 2020-11-24 16:26
When using `pip install package_name` installing a package, it will generate a `installed-files.txt` file, which records the file that the package contains. 

When updating or uninstalling the package, pip will need to read the `installed-files.txt` file, then delete the old files. 

If the package installed contains files whose name has unicode character like `文件`, the problem will occur. 

In China (I don't know other places), for historical reasons, the Windows default system codec is `gbk`, so the `installed-files.txt` file is also written with `gbk` codec when installing a package. 

When it comes to updating or uninstalling, the pip will use `utf-8` codec to read the `installed-files.txt` file. Since the file contains non ascii characters, it went error: 

```
  File "d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 1424, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 343: invalid start byte in installed-files.txt file at path: d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\Markdown_Toolbox-0.0.8-py3.9.egg-info\installed-files.txt
```

I hate that default `gbk` system codec, but this set is fixed on Windows. 

So, my suggestion is, make a `try except` at the error point, if the `utf-8` codec went wrong reading `installed-files.txt`, then let `gbk` codec have a go. 

Or, more foundamental solution is, when pip writing text files, strictly use `utf-8` codec instead of the default system codec.
msg381754 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-24 16:27
Please report the issue to https://github.com/pypa/pip

pip is not part of Python stdlib.
msg381756 - (view) Author: 赵豪杰 (HaujetZhao) * Date: 2020-11-24 16:40
got it.
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86619
2020-11-24 16:40:57HaujetZhaosetnosy: paul.moore, vstinner, tim.golden, zach.ware, steve.dower, HaujetZhao
messages: + msg381756
2020-11-24 16:27:44vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg381754

resolution: third party
stage: resolved
2020-11-24 16:26:27HaujetZhaocreate