classification
Title: distutils doesn't correctly read UTF-8 content from config files
Type: behavior Stage:
Components: Distutils, Library (Lib), Windows Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: delivrance, dstufft, eric.araujo, paul.moore, steve.dower, tim.golden, vlcinsky, zach.ware
Priority: normal Keywords:

Created on 2017-12-05 16:46 by delivrance, last changed 2018-01-25 13:43 by vlcinsky.

Files
File name Uploaded Description Edit
setup.cfg delivrance, 2017-12-05 17:10
setup.py delivrance, 2017-12-05 17:10
Pull Requests
URL Status Linked Edit
PR 4727 open delivrance, 2017-12-05 16:46
Messages (5)
msg307668 - (view) Author: Dan (delivrance) Date: 2017-12-05 16:46
On Windows, distutils doesn't correctly read UTF-8 content from config files (setup.cfg).

Seems like the issue is located on the line reading the files via the ConfigParser; simply adding 'encoding="UTF-8"' as argument fixes the problem for me: https://github.com/python/cpython/pull/4727

On Linux it seems to be working fine.
msg307669 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2017-12-05 16:57
Can you give an example setup.cfg file, setup.py command and the full error message?
msg307670 - (view) Author: Dan (delivrance) Date: 2017-12-05 17:13
I've attached the files.

Run using 'python setup.py sdist'.
The resulting PKG-INFO will contain incorrect data:

Summary: délivrance
Author: Dan Tès

The expected output is:

Summary: délivrance
Author: Dan Tès
msg307860 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2017-12-08 17:53
`metadata` in setup.cfg is not supported directly by distutils.  Can you provide a setup.py script that shows the problem without setuptools?
msg310674 - (view) Author: Jan Vlcinsky (vlcinsky) Date: 2018-01-25 13:43
The fix shall go to

https://github.com/python/cpython/blob/2812d3d99287c50bab99625d7240bcf1c2e32369/Lib/distutils/dist.py#L406

where `parser.read(filename)`

shall be changed to `parser.read(filename, encoding="utf-8")`

This assumes that the setup.cfg shall be UTF-8 encoded what I thing is correct assumption.

Alternative assumptions are (and I do not find them good):

- assume the file is encoded as current console is (this is not deterministic and is direct cause of this issue)
- let user to specify the encoding somewhere around (this requires extra step and does not bring any value)
History
Date User Action Args
2018-01-25 13:43:29vlcinskysetnosy: + vlcinsky
messages: + msg310674
2017-12-08 17:53:48eric.araujosetmessages: + msg307860
2017-12-05 17:13:50delivrancesetmessages: + msg307670
2017-12-05 17:10:58delivrancesetfiles: + setup.py
2017-12-05 17:10:52delivrancesetfiles: + setup.cfg
2017-12-05 16:57:45eric.araujosetversions: - Python 3.4, Python 3.5, Python 3.8
2017-12-05 16:57:32eric.araujosetmessages: + msg307669
2017-12-05 16:46:26delivrancecreate