Issue 42325: UnicodeDecodeError executing ./setup.py during build

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86491

classification

Title:	UnicodeDecodeError executing ./setup.py during build
Type:		Stage:
Components:	Build	Versions:	Python 3.10

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	skip.montanaro, yan12125
Priority:	normal	Keywords:

Created on 2020-11-11 13:57 by skip.montanaro, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
m17n-X.h	skip.montanaro, 2020-11-11 13:57

Messages (2)
msg380761 - (view)	Author: Skip Montanaro (skip.montanaro) *	Date: 2020-11-11 13:57
I recently replaced Ubuntu 20.04 with Manjaro 20.2. In the process my Python builds broke in the sharedmods target of the Makefile. The tail end of the traceback is: File "/home/skip/src/python/cpython/./setup.py", line 246, in grep_headers_for if function in f.read(): File "/home/skip/src/python/cpython/Lib/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 1600: invalid start byte The grep_headers_for() function in setup.py appeared to be the culprit, so I added a print statement to its loop: def grep_headers_for(function, headers): for header in headers: print("*", header, file=sys.stderr) with open(header, 'r') as f: if function in f.read(): return True return False which printed these lines: * /usr/include/umfpack_report_perm.h * /usr/include/dbstl_dbc.h * /usr/include/itclTclIntStubsFcn.h * /usr/include/dbstl_vector.h * /usr/include/cholmod_blas.h * /usr/include/amd.h * /usr/include/m17n-X.h Sure enough, that m17n-X.h file (attached) doesn't contain utf-8 (my environment's encoding). According to the Emacs coding cookie at the end, the file is euc-japan encoded. Would simply catching the exception in grep_headers_for() be the correct way to deal with this?
msg380956 - (view)	Author: (yan12125) *	Date: 2020-11-14 05:01
I got a similar issue on Arch Linux - see issue42351.

History
Date	User	Action	Args
2022-04-11 14:59:38	admin	set	github: 86491
2020-11-14 05:01:23	yan12125	set	nosy: + yan12125 messages: + msg380956
2020-11-11 13:57:47	skip.montanaro	create