This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: '%o' % user-defined instance
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: amaury.forgeotdarc, arigo, benjamin.peterson, eric.araujo, eric.smith, francismb, python-dev, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2011-02-07 18:46 by arigo, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue11145.patch serhiy.storchaka, 2014-10-04 19:36 review
issue11145_2.patch serhiy.storchaka, 2014-11-17 12:32 review
issue11145_3.patch serhiy.storchaka, 2014-11-17 19:35 review
issue11145_3_simpler.patch serhiy.storchaka, 2014-11-17 19:36 Slightly simpler and slower review
issue11145_4.patch serhiy.storchaka, 2014-11-23 20:06 review
issue11145_5.patch serhiy.storchaka, 2016-11-30 09:47 review
Messages (32)
msg128148 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-02-07 18:46
The expression  '%o' % x,  where x is a user-defined instance, usually ignores a user-defined __oct__() method.  I suppose that's fine; assuming this is the expected behavior, then the present issue is about the "usually" in my previous sentence.

If 'x' is an instance of a subclass of 'long', then the __oct__() is called.  It must be specifically a 'long' -- not, say, 'int' or 'str' or 'object'.  Moreover, there is a test in test_format.py (class Foobar) that checks that if this __oct__() returns a non-string, then we get a nice TypeError.  That's already strange -- why is __oct__() called in the first place?  But trying out more I get (CPython 2.7.1):

>>> class X(long):
...   def __oct__(self):
...     return 'abc'
... 
>>> '%o' % X()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: ../trunk/Objects/stringobject.c:4035: bad argument to internal function
msg128152 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-02-07 19:09
And this code will crash a debug build:

>>> class X(long):
...     def __oct__(self):
...         return 'foo'.upper()
...
>>> '%o' % X()
Assertion failed: buf[sign] == '0', file ..\..\Objects\stringobject.c, line 4059
msg128196 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-08 22:46
The doc for %o only says it returns the “signed octal value”; __oct__ is not a magic method (i.e. it’s not a name with special meaning), converting with oct() actually uses __index__.
msg128207 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2011-02-09 10:49
Eric: that's wrong, it is a magic method.  See for example "__oct__" in Objects/typeobject.c.  I'm not sure I understand why you would point this out, though.  A "SystemError: bad argument to internal function" or an "Assertion failed" are both bugs anyway.
msg128211 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-02-09 14:03
It's a magic method in 2.7 but not 3.x.
msg228460 - (view) Author: Francis MB (francismb) * Date: 2014-10-04 14:07
Just updating the type to 'behavior'.
 
I can still reproduce this issue:

$ python2.7
Python 2.7.8 (default, Sep  9 2014, 22:08:43) 
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class Y(long):
...   def __oct__(self):
...     return 'abc'
... 
>>> '%o' % Y()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: ../Objects/stringobject.c:4045: bad argument to internal function
>>>
msg228466 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 17:26
Here is a patch which fixes this issue. Now ValueError with relevant message is raised instead of SystemError or crash.
msg228478 - (view) Author: Francis MB (francismb) * Date: 2014-10-04 19:04
Do have I overseen the patch? or may be doing something wrong?
or isn't anything uploaded?
msg228486 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 19:36
Oh, sorry. I again forgot to upload a patch.
msg231271 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 08:12
Could anyone please make a review of the patch?
msg231275 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-11-17 09:47
It's seriously obscure to call a user-defined __oct__ method and then mangle the resulting string in ways that only make sense if the __oct__ method returned something reasonable.

The patch is probably a little more complicated than it could be.  For example, I don't understand the special cases for `llen <= 1`: I don't think `PyString_FromStringAndSize(NULL, 1)` can return an existing object.  And if you cut off the final 'L' you don't replace it with '\0' any longer, which could in theory break some callers expecting to see a null-terminated string here.  PyErr_Format() should use `%.200s` for `tp_name`.
msg231279 - (view) Author: Francis MB (francismb) * Date: 2014-11-17 10:47
I'm not sure if it's relevant but in the patch you changed
previous 'assert(check)' with 'if (not check) goto error'.
But the new patch code adds 'assert(len == 0 || Py_REFCNT(r1) == 1);'

Just curious, is there a reason why couldn't be in the same way?
msg231281 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 12:32
Thank you for your review Arigo.

Here is updated patch.

Note that the patch also fixes a reference leak if llen > INT_MAX.
msg231282 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 12:36
> Just curious, is there a reason why couldn't be in the same way?

Old asserts depend on user code, new asserts check internal consistency.
msg231287 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-11-17 16:30
+        if (Py_REFCNT(result) == 1)
+            buf[len] = '\0';

...and if the refcount is not equal to 1, then too bad, we won't null-terminate the string and hope that nobody crashes because of this.??
msg231288 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 16:50
If the refcount is not equal to 1, we will copy the content to new null-
terminated string.
msg231289 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-11-17 17:33
Ah, sorry.  Ok.  Now a different issue: the user-defined function can return an interned string.  If it has a refcount of 1, _PyString_FormatLong() will mutate it.  Then when we DECREF it, string_dealloc() will not find it any more in the interned dict and crash with a fatal error.

Note that I'm mostly having fun finding holes in delicate logic, like mutating strings in-place.  It would be much more simple to either (1) stop calling the user-defined functions and behave similarly to most other built-in types; or (2) stop trying to mutate that poor string in-place and always just create a new one. :-)
msg231292 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-17 19:35
Good catch! Here is a patch which fixes this issue too.

> (1) stop calling the user-defined functions and behave similarly to most other built-in types;

This is done in 3.x.

> (2) stop trying to mutate that poor string in-place and always just create a new one.

The patch can be simplified by removing 6 lines. But this will slow down integer formatting by few (about 2-7) percents in worst case. Not a big deal.
msg231415 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2014-11-20 09:19
If buf contains "-00" and the type is 'o', then it will be modified in-place even if the string is shared.
msg231417 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-20 09:58
Heh, it's getting really funny.
msg231577 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-23 20:06
Here is new patch. It first split string on areas: numnondigits (sign+"0x" if F_ALT is not set), skipped ("0x" if F_ALT is set), numdigits and optional "L" suffix, and then construct new string either in-place (if the string is not shared and result fits in original string) or in new string. It uses not more allocations than current code and should not add overhead for common cases.
msg234850 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-27 20:51
Hey, Armin. Do you want to continue bug hunting?
msg234855 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2015-01-27 22:17
Sorry, your patch still contains similar issues.  I postponed continuing to bounce it back and forth, but it seems that someone else needs to take my place now.
msg236333 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-20 21:39
Sorry, I didn't find any issues with the last patch. Could you please point on them?
msg243055 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-13 08:20
I there are no objection's, I'll commit the last patch.
msg243089 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015-05-13 14:18
Armin indicated in his last comment that the patch still has multiple issues.

Are there tests to catch the issues he previously found?  That seems the best method to verify that the current (and future) patches don't break 2.7.
msg243103 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-13 16:57
It is easy to find a bug than reproduce it.
msg282031 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-29 19:17
I read the code multiple times but still don't see any issues with the last path. If anybody know issues with it, please point on them. Otherwise I'll commit the patch.
msg282066 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2016-11-30 09:18
I reviewed your patch again.  It does look good after all: I find only one issue---it seems I implied there were several ones but I can't find more.  The issue is that PyString_AsString(result) will succeed if 'result' is a unicode.  Then you call PyString_GET_SIZE(result), which gives nonsense for unicode objects.
msg282069 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-30 09:47
Good catch Armin! Thank you for your review.

Updated patch uses PyString_AsStringAndSize() and adds a check that result is exact str before changing it in-place.
msg282070 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2016-11-30 09:51
Looks ok now!
msg282145 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-12-01 08:27
New changeset adb296e4bcaa by Serhiy Storchaka in branch '2.7':
Issue #11145: Fixed miscellaneous issues with C-style formatting of types
https://hg.python.org/cpython/rev/adb296e4bcaa
History
Date User Action Args
2022-04-11 14:57:12adminsetgithub: 55354
2016-12-01 08:47:56serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-12-01 08:27:45python-devsetnosy: + python-dev
messages: + msg282145
2016-11-30 09:51:10arigosetmessages: + msg282070
2016-11-30 09:47:59serhiy.storchakasetfiles: + issue11145_5.patch

messages: + msg282069
2016-11-30 09:18:55arigosetmessages: + msg282066
2016-11-29 19:17:59serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg282031
2015-07-21 07:14:43ethan.furmansetnosy: - ethan.furman
2015-05-13 16:57:20serhiy.storchakasetmessages: + msg243103
2015-05-13 14:18:26ethan.furmansetmessages: + msg243089
2015-05-13 08:20:31serhiy.storchakasetmessages: + msg243055
2015-02-20 21:39:22serhiy.storchakasetmessages: + msg236333
2015-01-28 18:09:00ethan.furmansetnosy: + ethan.furman
2015-01-27 22:17:33arigosetmessages: + msg234855
2015-01-27 20:51:47serhiy.storchakasetmessages: + msg234850
2014-11-23 20:06:59serhiy.storchakasetfiles: + issue11145_4.patch

messages: + msg231577
2014-11-20 09:58:42serhiy.storchakasetmessages: + msg231417
2014-11-20 09:19:47arigosetmessages: + msg231415
2014-11-17 19:36:30serhiy.storchakasetfiles: + issue11145_3_simpler.patch
2014-11-17 19:35:06serhiy.storchakasetfiles: + issue11145_3.patch

messages: + msg231292
2014-11-17 17:33:45arigosetmessages: + msg231289
2014-11-17 16:50:42serhiy.storchakasetmessages: + msg231288
2014-11-17 16:30:13arigosetmessages: + msg231287
2014-11-17 12:36:14serhiy.storchakasetmessages: + msg231282
2014-11-17 12:32:29serhiy.storchakasetfiles: + issue11145_2.patch

messages: + msg231281
2014-11-17 10:47:04francismbsetmessages: + msg231279
2014-11-17 09:47:52arigosetmessages: + msg231275
2014-11-17 08:12:55serhiy.storchakasetkeywords: + needs review

messages: + msg231271
2014-10-04 19:36:32serhiy.storchakasetfiles: + issue11145.patch
keywords: + patch
messages: + msg228486
2014-10-04 19:04:47francismbsetmessages: + msg228478
2014-10-04 17:26:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg228466

type: behavior -> crash
stage: patch review
2014-10-04 14:07:49francismbsettype: behavior

messages: + msg228460
nosy: + francismb
2011-02-09 14:03:04benjamin.petersonsetnosy: arigo, amaury.forgeotdarc, eric.smith, benjamin.peterson, eric.araujo
messages: + msg128211
2011-02-09 10:58:40pitrousetnosy: + benjamin.peterson
2011-02-09 10:49:56arigosetnosy: arigo, amaury.forgeotdarc, eric.smith, eric.araujo
messages: + msg128207
2011-02-08 22:46:55eric.araujosetnosy: + eric.araujo
messages: + msg128196
2011-02-07 20:44:50eric.smithsetnosy: + eric.smith
2011-02-07 19:09:12amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg128152
2011-02-07 18:46:33arigocreate