This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: crasher in str(Exception())
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: Trundle, arigo, doerwalter, eric.smith, ezio.melotti
Priority: high Keywords: patch

Created on 2009-11-12 10:50 by arigo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue7309-1.patch eric.smith, 2009-11-14 22:25
Messages (18)
msg95159 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2009-11-12 10:50
The __str__ method of some exception classes reads attributes without
typechecking them.  Alternatively, the issue could be that the user is
allowed to set the value of these attributes directly, without
typecheck.  The typechecking is only done when we create the exception,
but not later.  Example:

>>> u=UnicodeTranslateError(u'x', 1, 5, 'bah')
>>> u.reason = 0x345345345345345345  
>>> str(u)
"can't translate characters in position 1-4: E\x03"

The 'E\x03' comes from PyString_AS_STRING(reason).  By playing enough it
is probably possible to come up with a real crasher.
msg95225 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-11-14 02:06
I don't know if this is a real problem. If someone who want to crash
someone else program is able to do something like 'u.reason =
somethingweird' there are already more serious problems to solve.
I don't see why someone would want to do that in his own program either.

So, even assuming that PyString_AS_STRING() might indeed crash when some
weird arg is passed, the problem should be fixed there and not
typechecking all the args before calling it. (i.e. even if you fix it
for Exceptions, there are probably several other places where you can
set arbitrary things that will be passed to PyString_AS_STRING() anyway.)

That said, I played with it and tried to set u.reason with a number of
things (including big numbers and strings, Unicode chars outside the
BMP, builtin types, functions, modules) and str(u) either returned an
empty string or a random sequence of bytes (like your 'E\x03'), but it
didn't crash.

Unless you can find a way to make it crash, I'd close this as 'invalid'.
msg95228 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-11-14 06:24
After further investigations I found out that PyString_AS_STRING() is
the macro form of PyString_AsString() but without error checking (so
there's nothing to fix there, possibly just replace that call with
PyString_AsString if it turns out to be a real problem).
I think that what I said in the first paragraph of my previous message
is still true though, and that might be the reason why they preferred
PyString_AS_STRING() in the first place.
msg95230 - (view) Author: Andreas Stührk (Trundle) * Date: 2009-11-14 10:18
Crashes reliable with a segfault in Python 3.1.1.

Fixing the setter so that one can only set strings and not arbitrary 
objects is possibly the best solution.
msg95232 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-14 10:49
I'm not sure why reason should be restricted to a string. This patch
(against trunk) just converts reason to a string when str() is called.
I'll add tests and fix the other places in exceptions.c where similar
shortcuts are taken without checking, if there's agreement on the approach.
msg95238 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-11-14 12:29
Note that on Py2.6, when, for example, a string is assigned to u.start
and u.end a TypeError is raised, and the value is then set to -1:
>>> u=UnicodeTranslateError(u'x', 1, 5, 'bah')
>>> u.start = 'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required
>>> u.end = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required
>>> str(u)
"can't translate characters in position -1--2: bah"
>>> u.start, u.end
(-1, -1)

Is it possible to change the values assigning an int (or even a float
that is then converted to int).

On py3k the behavior is different; as Trundle said, it segfaults easily,
and trying to change the value of u.start and u.end returns a different
error:
>>> u.start = 'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/longobject.c:441: bad argument to internal function


Also note that on both the versions there's no check on these values
either, it's easy to have a segfault doing this:
>>> u = UnicodeTranslateError(u'x', 1, 5, 'bah')
>>> u.start = 2**30
>>> u.end = 2**30+1
>>> str(u)

(if the char is only one, Python will try to read it and display it)
msg95251 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-14 18:45
The patch that is (hopefully) attached is a first, incomplete cut just
for demonstration purposes. I still need to cover all of the cases where
PyString_AS_STRING are called without type checking. Also, as Ezio
points out, start and end are used to index an array without type
checking. I'll fix that as well.

The patch is against trunk.
msg95252 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-11-14 18:57
The same problem (u.start and u.end) also affects the other UnicodeError
exceptions (namely UnicodeEncodeError and UnicodeDecodeError).

Py2.4 and 2.5 don't seem to segfault with the example I provided.
msg95261 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-14 22:19
Another patch against trunk which deals with:

UnicodeEncodeError: reason and encoding
UnicodeDecodeError: reason and encoding
UnicodeTranslateError: reason

Still needs tests. Also, the unchecked use of start and end needs to be
addressed. I'm working on that.
msg95265 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-14 22:55
Tests need to cover issues like:

# assigning a non-string to e.object
e = UnicodeDecodeError("", "", 0, 1, "")
e.object = None
print str(e)

# start and end out of range
e = UnicodeDecodeError("", "", 0, 1, "")
e.start = 1000
e.end = 1001
print str(e)

For all cases of UnicodeXXXError with start and end, the code has a
special case for end = start+1. Invalid start/end tests need to have
end==start+1, end>start+1, end<start+1.

I'm not sure what the functions should do when start and end are out of
range.
msg95277 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-11-15 08:49
> I'm not sure what the functions should do when start and end are
> out of range.

I think the best approach would be to prevent these values to be out of
range in the first place. 
All the args should be checked when the instance is created (to avoid
things like UnicodeTranslateError(None, 2**30, 2**30+1, 'bah')) and
then, if possible, the attributes should be set as read-only.
I don't see any valid reason to change them anyway.
msg95309 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-15 21:28
I agree there's not much value in making the attributes read/write, but
it looks like all of the exceptions allow it, so I don't really want to
make these exceptions the only ones that are different.
msg95335 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-16 09:49
>> I'm not sure what the functions should do when start and end are
>> out of range.
>
> I think the best approach would be to prevent these values to be out of
> range in the first place. 

The start and end values should be clipped, just like normal slices in
Python do:

>>> ""[2**30:2**30+1]
''

> I agree there's not much value in making the attributes read/write,
> but it looks like all of the exceptions allow it, so I don't really
> want to make these exceptions the only ones that are different.

Exception attributes *must* be read/write, because the codecs create an
exception object once, and then uses this exception object to
communicate multiple errors to the callback. PEP 293 states: "Should
further encoding errors occur, the encoder is allowed to reuse the
exception object for the next call to the callback."
msg95336 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-16 09:52
Thanks, Walter. I'll finish my patch, then.
msg100034 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-02-24 14:28
Fixed:

trunk: r78418
release26-maint: r78419

Still working on porting to py3k and release31-maint.
msg100037 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2010-02-24 15:22
On 24.02.10 15:28, Eric Smith wrote:

> Eric Smith <eric@trueblade.com> added the comment:
> 
> Fixed:
> 
> trunk: r78418
> release26-maint: r78419
> 
> Still working on porting to py3k and release31-maint.

A much better solution would IMHO be to forbid setting the encoding,
object and reason attributes to objects of the wrong type in the first
place. Unfortunately this would require an extension to PyMemberDef for
the T_OBJECT and T_OBJECT_EX types.
msg100038 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-02-24 15:24
> A much better solution would IMHO be to forbid setting the encoding,
> object and reason attributes to objects of the wrong type in the first
> place. Unfortunately this would require an extension to PyMemberDef for
> the T_OBJECT and T_OBJECT_EX types.

Agreed that's a better approach. But I wanted to get the fix in for 2.6 
and 3.1.

You can open another issue if you'd like. I'm going to close this one as 
soon as I get the crash fixed.
msg100040 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-02-24 15:54
Fixed:

py3k: r78420
release31-maint: r78421
History
Date User Action Args
2022-04-11 14:56:54adminsetgithub: 51558
2010-02-24 15:54:26eric.smithsetstatus: open -> closed
resolution: fixed
messages: + msg100040

stage: patch review -> resolved
2010-02-24 15:24:38eric.smithsetmessages: + msg100038
2010-02-24 15:22:03doerwaltersetmessages: + msg100037
2010-02-24 14:28:04eric.smithsetmessages: + msg100034
2009-11-16 09:52:50eric.smithsetmessages: + msg95336
2009-11-16 09:49:56doerwaltersetnosy: + doerwalter
messages: + msg95335
2009-11-15 21:28:24eric.smithsetmessages: + msg95309
2009-11-15 08:49:40ezio.melottisetmessages: + msg95277
2009-11-14 22:55:14eric.smithsetmessages: + msg95265
2009-11-14 22:25:47eric.smithsetfiles: - issue7309.patch
2009-11-14 22:25:41eric.smithsetfiles: + issue7309-1.patch
2009-11-14 22:20:09eric.smithsetfiles: - issue7309.patch
2009-11-14 22:19:55eric.smithsetfiles: + issue7309.patch

messages: + msg95261
2009-11-14 18:57:53ezio.melottisetmessages: + msg95252
2009-11-14 18:45:41eric.smithsetmessages: - msg95235
2009-11-14 18:45:32eric.smithsetfiles: + issue7309.patch
keywords: + patch
messages: + msg95251
2009-11-14 12:29:06ezio.melottisetmessages: + msg95238
2009-11-14 11:04:48eric.smithsetmessages: + msg95235
2009-11-14 10:59:35eric.smithsetmessages: - msg95234
2009-11-14 10:59:29eric.smithsetmessages: - msg95233
2009-11-14 10:56:40eric.smithsetmessages: + msg95234
2009-11-14 10:54:37eric.smithsetmessages: + msg95233
2009-11-14 10:49:20eric.smithsetpriority: low -> high

type: crash
assignee: eric.smith
versions: + Python 2.6, Python 3.2
nosy: + eric.smith

messages: + msg95232
stage: test needed -> patch review
2009-11-14 10:18:01Trundlesetnosy: + Trundle

messages: + msg95230
versions: + Python 3.1
2009-11-14 06:24:32ezio.melottisetmessages: + msg95228
2009-11-14 02:06:44ezio.melottisetpriority: low

nosy: + ezio.melotti
messages: + msg95225

stage: test needed
2009-11-12 10:50:48arigocreate