classification
Title: Setup.py: UnicodeDecodeError in grep_headers_for
Type: compile error Stage: resolved
Components: Build Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: epaine, ronaldoussoren, yan12125
Priority: normal Keywords: patch

Created on 2020-11-13 17:59 by epaine, last changed 2020-11-14 15:09 by epaine. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23279 merged ronaldoussoren, 2020-11-14 11:20
Messages (6)
msg380913 - (view) Author: E. Paine (epaine) * Date: 2020-11-13 17:59
When compiling the master branch (i.e. running 'make'), I get a UnicodeDecodeError as follows:
Traceback (most recent call last):
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2619, in <module>
    main()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2589, in main
    setup(# PyPI Metadata (PEP 301)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 471, in build_extensions
    self.detect_modules()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 1825, in detect_modules
    self.detect_ctypes()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2205, in detect_ctypes
    if grep_headers_for('ffi_prep_cif_var', ffi_headers):
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 246, in grep_headers_for
    if function in f.read():
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 4210: invalid start byte

The problematic file it is trying to read is /usr/include/OMX_Other.h which is part of the libomxil-bellagio package (a copy of this package can be downloaded from https://www.archlinux.org/packages/extra/x86_64/libomxil-bellagio/download/). More specifically, there are several characters in the comments which cannot be decoded correctly (the first of these is on line 93).

The fix is a very simple one and is just to add errors='replace' to line 244 of setup.py (I cannot see this having any ill-effects).

I couldn't find who to nosy for this so apologies about that.
msg380955 - (view) Author: Chih-Hsuan Yen (yan12125) * Date: 2020-11-14 04:59
I can also confirm the issue on our Arch Linux server [1]. The problematic file is also /usr/include/OMX_Other.h.

Looks like it is a regression from https://github.com/python/cpython/pull/22855 (https://bugs.python.org/issue41100). Ronald Oussoren, mind to have a look?

[1] https://build.archlinuxcn.org/~imlonghao/log/python-git/2020-11-14T01:17:02.html
msg380965 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-11-14 09:34
That's annoying. A quick workaround is to patch setup.py:get_headers_for and add "encoding='latin1'" to the arguments of open.

I'll look into a better fix later this weekend.
msg380971 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-11-14 11:20
I've created PR. Could you please check if that fixes the problem?
msg380978 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-11-14 15:07
New changeset 7a27c7ed4b2b45bb9ea27d3f5c4f423495d6e939 by Ronald Oussoren in branch 'master':
bpo-42351: Avoid error when opening header with non-UTF8 encoding (GH-23279)
https://github.com/python/cpython/commit/7a27c7ed4b2b45bb9ea27d3f5c4f423495d6e939
msg380979 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-11-14 15:08
Thanks for testing!
History
Date User Action Args
2020-11-14 15:09:22epainesetstatus: open -> closed
2020-11-14 15:08:32ronaldoussorensetresolution: fixed
messages: + msg380979
stage: patch review -> resolved
2020-11-14 15:07:51ronaldoussorensetmessages: + msg380978
2020-11-14 11:20:51ronaldoussorensetmessages: + msg380971
2020-11-14 11:20:07ronaldoussorensetkeywords: + patch
stage: patch review
pull_requests: + pull_request22173
2020-11-14 09:34:25ronaldoussorensetmessages: + msg380965
2020-11-14 04:59:24yan12125setnosy: + ronaldoussoren, yan12125
messages: + msg380955
2020-11-13 17:59:24epainecreate