This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: import _curses fails because of UnicodeDecodeError('utf8' codec can't decode byte 0xb5 ...') on ARM Ubuntu 3.x
Type: Stage:
Components: Library (Lib), Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, brett.cannon, ezio.melotti, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2011-12-10 13:47 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
curses_getkey_utf8_surrogateescape.patch serhiy.storchaka, 2012-10-16 19:05 review
curses_getkey_latin1.patch serhiy.storchaka, 2012-10-16 19:05 review
curses_getkey_locale.patch serhiy.storchaka, 2012-10-16 19:05 review
Messages (13)
msg149155 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-10 13:47
http://www.python.org/dev/buildbot/all/builders/ARM%20Ubuntu%203.x/builds/143/steps/test/logs/stdio
---
test test_curses crashed -- Traceback (most recent call last):
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/test/regrtest.py", line 1214, in runtest_inner
    the_package = __import__(abstest, globals(), locals(), [])
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/test/test_curses.py", line 23, in <module>
    curses = import_module('curses')
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/test/support.py", line 105, in import_module
    return importlib.import_module(name)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/__init__.py", line 123, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 840, in _gcd_import
    loader.load_module(name)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 466, in load_module
    return self._load_module(fullname)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 170, in decorated
    return fxn(self, module, *args, **kwargs)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 371, in _load_module
    exec(code_object, module.__dict__)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/curses/__init__.py", line 13, in <module>
    from _curses import *
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 190, in inner
    return method(self, name, *args, **kwargs)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 126, in wrapper
    module = fxn(*args, **kwargs)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 139, in wrapper
    module = fxn(self, *args, **kwargs)
  File "/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Lib/importlib/_bootstrap.py", line 574, in load_module
    return imp.load_dynamic(fullname, self._path)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte
---

The locale encoding is UTF-8 (LANG=en_US.UTF-8), Python directory is /var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build.

Full header:
---
make buildbottest TESTOPTS= TESTPYTHONOPTS= TESTTIMEOUT=3600
 in dir /var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build (timeout 3900 secs)
 watching logfiles {}
 argv: ['make', 'buildbottest', 'TESTOPTS=', 'TESTPYTHONOPTS=', 'TESTTIMEOUT=3600']
 environment:
  HOME=/var/lib/buildbot
  LANG=en_US.UTF-8
  LOGNAME=buildbot
  MAIL=/var/mail/buildbot
  PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
  PWD=/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build
  SHELL=/bin/sh
  SPEECHD_PORT=6678
  TERM=linux
  USER=buildbot
  XDG_SESSION_COOKIE=559ce1e92f80d0de13b3936e4c1475f9-29.919136-1746652091
 closing stdin
 using PTY: False
---

In Unicode, U+00B5 is µ (micro sign).

Add Barry, the owner of the buildbot, to the nosy list. Add Brett to the nosy, even if I don't think that the problem comes from the importlib.
msg149156 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-10 13:51
The compilation of the module failed for the same reason:

building '_curses' extension
gcc -pthread -fPIC -Wno-unused-result -g -O0 -Wall -Wstrict-prototypes -DHAVE_NCURSESW=1 -I/usr/include/ncursesw -IInclude -I. -I./Include -I/usr/include/arm-linux-gnueabi -I/usr/local/include -I/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build -c /var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Modules/_cursesmodule.c -o build/temp.linux-armv7l-3.3-pydebug/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Modules/_cursesmodule.o
gcc -pthread -shared build/temp.linux-armv7l-3.3-pydebug/var/lib/buildbot/buildarea/3.x.warsaw-ubuntu-arm/build/Modules/_cursesmodule.o -L/usr/lib/arm-linux-gnueabi -L/usr/local/lib -lncursesw -o build/lib.linux-armv7l-3.3-pydebug/_curses.cpython-33dm.so
*** WARNING: importing extension "_curses" failed with <class 'UnicodeDecodeError'>: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte
msg149158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-10 13:58
The problem comes maybe from the name of a curses key, keyname(). PyInit__curses() gets the name of all keys (KEY_MIN..KEY_MAX).
msg149201 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011-12-11 00:53
Fails in exactly the same way when built from my shell account using current hg head.  Does not fail on same version of OS on amd64.
msg149266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-12 01:15
@Barry: can you try to get a trace using gdb? Start python in gdb, set a breapoint on PyErr_SetObject, continue, run the Python command "import _curses", get the gdb traceback (or continue if the error is not the UTF-8 error).
msg149314 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2011-12-12 15:30
This fails for me on OS X Snow Leopard using LLVM 3.0.

And I agree with your initial guess, Victor: I don't see how importlib could possibly be the issue here since it's using load_dynamic() and not loading some Python source itself.
msg171756 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-01 23:01
What is the status of this issue? Is anyone able to reproduce it? If not, I would like to close it.
msg171759 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-10-01 23:17
I can't, so setting to pending so that if no one speaks up the issue will close.
msg173065 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-16 17:56
Barry, you can reproduce it? The issue obviously in PyUnicode_FromString() call from PyCursesWindow_GetKey(). We can try latin1 encoding, locale encoding or utf-8 with surrogateescape error handler.
msg173070 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-16 19:05
Here are three patches. I think one of them (or any) should fix the issue. The 
question is what solution returns more suitable result for non-ascii key 
names.
msg173099 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2012-10-16 21:22
I cannot reproduce it with Python 3.3 hg head on my ARM buildbot.  _curses builds and imports just fine now.
msg175770 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-11-17 16:46
Should we just close this, Barry?
msg175819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-11-17 22:40
> The issue obviously in PyUnicode_FromString() call from PyCursesWindow_GetKey().
> We can try latin1 encoding, locale encoding or utf-8 with surrogateescape error handler.

PyUnicode_FromString() uses the UTF-8 decoder. I don't think that curses uses any non-ASCII name for a key. If we get a name which is not decodable from UTF-8, I bet that we have a more serious issue than the encoding. I prefer not to change the encoding for getkey (UTF-8 is just fine).

It looks like nobody saw this issue since months, even after the final release of Python 3.3. I'm closing this issue. It doesn't contain any useful information, it would be easy to reopen it or open a new issue.
History
Date User Action Args
2022-04-11 14:57:24adminsetgithub: 57781
2012-11-17 22:40:13vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg175819
2012-11-17 16:46:05brett.cannonsetstatus: pending -> open

messages: + msg175770
2012-10-24 09:18:14serhiy.storchakasetstatus: open -> pending
2012-10-16 21:22:12barrysetmessages: + msg173099
2012-10-16 19:05:39serhiy.storchakasetfiles: + curses_getkey_utf8_surrogateescape.patch, curses_getkey_latin1.patch, curses_getkey_locale.patch
keywords: + patch
messages: + msg173070
2012-10-16 17:56:50serhiy.storchakasetstatus: pending -> open
nosy: + serhiy.storchaka
messages: + msg173065

2012-10-01 23:17:36brett.cannonsetstatus: open -> pending

messages: + msg171759
2012-10-01 23:01:59vstinnersetmessages: + msg171756
2011-12-12 15:30:36brett.cannonsetmessages: + msg149314
2011-12-12 01:15:50vstinnersetmessages: + msg149266
2011-12-11 00:53:17barrysetmessages: + msg149201
2011-12-10 13:58:13vstinnersetmessages: + msg149158
2011-12-10 13:51:33vstinnersetmessages: + msg149156
2011-12-10 13:47:28vstinnercreate