This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: UnicodeDecodeError executing ./setup.py during build
Type: Stage:
Components: Build Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: skip.montanaro, yan12125
Priority: normal Keywords:

Created on 2020-11-11 13:57 by skip.montanaro, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
m17n-X.h skip.montanaro, 2020-11-11 13:57
Messages (2)
msg380761 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2020-11-11 13:57
I recently replaced Ubuntu 20.04 with Manjaro 20.2. In the process my Python builds broke in the sharedmods target of the Makefile. The tail end of the traceback is:

  File "/home/skip/src/python/cpython/./setup.py", line 246, in grep_headers_for
    if function in f.read():
  File "/home/skip/src/python/cpython/Lib/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 1600: invalid start byte

The grep_headers_for() function in setup.py appeared to be the culprit, so I added a print statement to its loop:

def grep_headers_for(function, headers):
    for header in headers:
        print("***", header, file=sys.stderr)
        with open(header, 'r') as f:
            if function in f.read():
                return True
    return False

which printed these lines:

*** /usr/include/umfpack_report_perm.h
*** /usr/include/dbstl_dbc.h
*** /usr/include/itclTclIntStubsFcn.h
*** /usr/include/dbstl_vector.h
*** /usr/include/cholmod_blas.h
*** /usr/include/amd.h
*** /usr/include/m17n-X.h

Sure enough, that m17n-X.h file (attached) doesn't contain utf-8 (my environment's encoding). According to the Emacs coding cookie at the end, the file is euc-japan encoded. Would simply catching the exception in grep_headers_for() be the correct way to deal with this?
msg380956 - (view) Author: (yan12125) * Date: 2020-11-14 05:01
I got a similar issue on Arch Linux - see issue42351.
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86491
2020-11-14 05:01:23yan12125setnosy: + yan12125
messages: + msg380956
2020-11-11 13:57:47skip.montanarocreate