This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multiple type confusions in unicode error handlers
Type: crash Stage: resolved
Components: Extension Modules, Interpreter Core, Unicode Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, christian.heimes, doerwalter, ezio.melotti, lemburg, pkt, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-05-01 14:14 by pkt, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
poc_unicode_errors.py pkt, 2015-05-01 14:14
codecs_error_handlers_issubclass.patch serhiy.storchaka, 2015-05-02 11:43 review
codecs_error_handlers_issubclass_2.patch serhiy.storchaka, 2015-05-02 12:45 review
codecs_error_handlers_issubclass_3.patch serhiy.storchaka, 2015-05-04 13:21 review
Messages (9)
msg242319 - (view) Author: paul (pkt) Date: 2015-05-01 14:14
# Breakpoint 1, PyUnicodeEncodeError_GetEnd (exc=<X at remote 0x405730e4>, end=0xbf9e8f7c) at Objects/exceptions.c:1643
# 1643        PyObject *obj = get_unicode(((PyUnicodeErrorObject *)exc)->object,
# (gdb) s
# get_unicode (attr=<unknown at remote 0x8c6a120>, name=0x82765ea "object") at Objects/exceptions.c:1516
# 1516        if (!attr) {
# (gdb) print *attr
# $4 = {_ob_next = 0xfefefefe, _ob_prev = 0xfefefefe, ob_refcnt = -16843010, ob_type = 0xfefefefe}
# (gdb) c
# Continuing.
# 
# Program received signal SIGSEGV, Segmentation fault.
# 0x080bc7d9 in get_unicode (attr=<unknown at remote 0x8cbe250>, name=0x82765ea "object") at Objects/exceptions.c:1521
# 1521        if (!PyUnicode_Check(attr)) {
#
# Type confusion. IsInstance check is ineffective because of custom 
# __getattribute__ method. Contents of string instance is interpreted as
# an exception object.
msg242391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-02 11:42
Here is simpler reproducer:

import codecs

class X(str):
    __class__ = UnicodeEncodeError

codecs.ignore_errors(X())

The problem is that PyObject_IsInstance() is fooled by custom __class__, but then builtin error handlers handle error object as having UnicodeEncodeError layout, while it doesn't.

Here is a patch that fixes the issue by using PyObject_IsSubclass() of exc->ob_type instead of PyObject_IsInstance().
msg242393 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2015-05-02 12:37
The patch does indeed fix the segmentation fault. However the exception message looks confusing:

   TypeError: don't know how to handle UnicodeEncodeError in error callback
msg242395 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-02 12:45
Here is a patch that makes error message consistent with type checking.
msg242397 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2015-05-02 13:05
Looks much better. However shouldn't:

   exc->ob_type->tp_name

be:

   Py_TYPE(exc)->tp_name

(although there are still many spots in the source that still use ob_type->tp_name)
msg242398 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-02 13:10
Py_TYPE() is necessary when the argument is not of type PyObject* (e.g. PyUnicodeObject*).
msg242556 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-04 13:21
Also fixed handling errors of PyObject_IsSubclass() (issue24115) in the _codecs module.
msg243476 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-18 13:11
New changeset 547bc11e3357 by Serhiy Storchaka in branch '2.7':
Issue #24102: Fixed exception type checking in standard error handlers.
https://hg.python.org/cpython/rev/547bc11e3357

New changeset 68eaa9409818 by Serhiy Storchaka in branch '3.4':
Issue #24102: Fixed exception type checking in standard error handlers.
https://hg.python.org/cpython/rev/68eaa9409818

New changeset 510819e5855e by Serhiy Storchaka in branch 'default':
Issue #24102: Fixed exception type checking in standard error handlers.
https://hg.python.org/cpython/rev/510819e5855e
msg243477 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-18 13:13
Greg Ewing suggested to use PyObject_TypeCheck (http://permalink.gmane.org/gmane.comp.python.devel/153216).
History
Date User Action Args
2022-04-11 14:58:16adminsetgithub: 68290
2015-05-18 13:13:38serhiy.storchakasetversions: + Python 2.7
2015-05-18 13:13:20serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg243477

stage: patch review -> resolved
2015-05-18 13:11:26python-devsetnosy: + python-dev
messages: + msg243476
2015-05-04 13:21:06serhiy.storchakasetfiles: + codecs_error_handlers_issubclass_3.patch

messages: + msg242556
2015-05-03 06:53:59Arfreversetnosy: + Arfrever
2015-05-02 13:10:50serhiy.storchakasetmessages: + msg242398
2015-05-02 13:05:28doerwaltersetmessages: + msg242397
2015-05-02 12:45:18serhiy.storchakasetfiles: + codecs_error_handlers_issubclass_2.patch

messages: + msg242395
2015-05-02 12:37:16doerwaltersetmessages: + msg242393
2015-05-02 11:43:19serhiy.storchakasetfiles: + codecs_error_handlers_issubclass.patch
keywords: + patch
2015-05-02 11:42:23serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg242391
stage: needs patch -> patch review
2015-05-02 04:52:15serhiy.storchakasetnosy: + ezio.melotti, lemburg, doerwalter, vstinner, serhiy.storchaka
components: + Interpreter Core, Unicode
2015-05-01 14:15:37christian.heimessetnosy: + christian.heimes
stage: needs patch

components: + Extension Modules
versions: + Python 3.5
2015-05-01 14:14:06pktcreate