When using `pip install package_name` installing a package, it will generate a `installed-files.txt` file, which records the file that the package contains.
When updating or uninstalling the package, pip will need to read the `installed-files.txt` file, then delete the old files.
If the package installed contains files whose name has unicode character like `文件`, the problem will occur.
In China (I don't know other places), for historical reasons, the Windows default system codec is `gbk`, so the `installed-files.txt` file is also written with `gbk` codec when installing a package.
When it comes to updating or uninstalling, the pip will use `utf-8` codec to read the `installed-files.txt` file. Since the file contains non ascii characters, it went error:
```
File "d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 1424, in get_metadata
return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 343: invalid start byte in installed-files.txt file at path: d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\Markdown_Toolbox-0.0.8-py3.9.egg-info\installed-files.txt
```
I hate that default `gbk` system codec, but this set is fixed on Windows.
So, my suggestion is, make a `try except` at the error point, if the `utf-8` codec went wrong reading `installed-files.txt`, then let `gbk` codec have a go.
Or, more foundamental solution is, when pip writing text files, strictly use `utf-8` codec instead of the default system codec.
|