New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace %.100s by %s in PyErr_Format(): the arbitrary limit of 500 bytes is outdated #55042
Comments
Guido created the "convenience function" PyErr_Format() 13 years ago (r7580) with a naive implementation using char buffer[500] and vsprintf(). He added a comment: /* Caller is responsible for limiting the format */ Ok, that was true 13 years ago. But today Python uses dynamically allocated buffers, and so we can simply replace %.100s ("%\.[0-9]+s" regex) by %s. Anyway, PyUnicode_FromFormatV() doesn't support precision for %s and %U formats :-) |
use_format.patch is a patch to avoid PyOS_snprintf() with a fixed buffer: use directly PyUnicode_FromFormat(), PyErr_Format() or PySys_FormatStderr() instead. This patch is just a draft, it should be tested :-) There are other calls to PyOS_snprintf() that can be replaced. |
Woops, I didn't want to do it, but I already commited the simple part of this issue (U format, eg. %.100U): r87753. |
I always thought that one of the reasons for specifying the length was in case a pointer pointed to garbage: at least you'd be limiting how much trash was printed. But maybe that's just my imagination and there is no such reason. |
I agree with Eric. Limiting field width when formatting error messages is a good safety measure. It is not only the amount of garbage that can be spitted in error logs, if garbage is not null-terminated, you may get a SEGV instead of a somewhat meaningful error message. |
Note that it is a good idea to avoid memory allocation during exception processing. Think of out of memory errors: how would you report those if you need more memory to format the error message than the failed operation? |
eric> I always thought that one of the reasons for specifying the length No, it's because of the old (removed) 500 bytes limit. There are different types of %s arguments:
belopolsky> Limiting field width when formatting error messages Can you give me at least one example? I think that it is very unlikely, or just impossible. But if I do replace "%.100s" by "%s" in all the code base, I accept to check each time that the argument is null-terminated and/or not controlable by the user. And maybe keep "%.100s" if I am not sure. belopolsky> Think of out of memory errors: ... PyErr_NoMemory() doesn't use PyErr_Format(), it's message is fixed, and the exception object is preallocated. |
On Thu, Jan 6, 2011 at 3:49 PM, STINNER Victor <report@bugs.python.org> wrote:
How do you know? I, for one, did use %.100s when formatting strings ..
I am concerned about conditions that are impossible in a valid
As long as we have ctypes, everything is controllable by user. Again,
Just keep "%.100s" always. It does not hurt and it is less work for you. :-)
Indeed, there are no PyErr_Format(PyExc_MemoryError, ..) in the code Code cleanup projects like this are rightfully frowned upon because |
Yes, %.100s may avoid a crash after the buffer overflow on the string formatting, but it may quickly crash on another instruction. I don't think that you should limit the error message length to protect Python against buffer overflow. A buffer overflow can corrupt everything, and we should fix the buffer overflow instead :-)
You have debuggers like gdb for that. |
belopolsky> Limiting field width when formatting error messages is The problem is that PyUnicode_FromFormatV() doesn't support %.100s format (it ignores .100). If we truncate at 100 bytes, it may truncate an UTF-8 string in the middle of a character :-/ And I don't want to implement it because I don't like arbitrary limitations in Python. |
see also bpo-7330, I'm implementing "%.100s" in that issue. |
I am working with Python since 5 years (and on Python since 3 years): I never seen any garbage string in error messages. My concern is to not truncate strings in error messages anymore. I consider that it is more important than using a hack to workaround some crashs. I call %.100s a hack because on a buffer overflow, other attributes are overwritten, and I think that the other attributes are more important than tp_name and the program will crash quickly. Try a buffer overflow on PyLong_Type and check how much time does it take to crash Python: I bet that the program will crash before formatting any exception.
This instruction comes from math.fsum() which allocate a new list with the size of the input size (not exactly, but it is based on the size of the input list). Python may fail to allocate an huge list, but it doesn't mean that it will be unable to allocate memory for the short string "math.fsum partials". Anyway, if the allocation of "math.fsum partials" string fail, another MemoryError (without any message) will be used. So it doesn't really matter.
Same than before: Python should be able to allocate the new error object, and if it fails: it will use MemoryError anyway.
I consider that there is no more good reason to truncate strings, I want to see the full error message! |
History of PyErr_Format():
belopolsky>> Limiting field width when formatting error messages me> Can you give me at least one example? I think that it is very Python allocates a dynamic buffer since r17159 (10 years ago), and the strings were *never* truncated just because %.100s format was never supported (it is interpreted as %s). If you still consider that %.100s protects is a good solution against crashes: you have to realize that Python doesn't truncate strings since 10 years, and nobody complained. -- The situation is changing because Ray Allen wrote a patch implementing %.100s: bpo-7330. I would like to decide what to do with %.100s in error messages before commiting bpo-7330. -- Eric and Alexander: do you still consider that %.100s is important in error messages? Or do you know agree that we can replace them with %s? |
I don't feel strongly one way or the other about it. I was just relaying the reason I heard when I asked the question about why it's done the way it is. I suggest mentioning that you're going to commit this change on python-dev, since this seems like an issue that people might care about without knowing about its existence in the bug tracker. |
New changeset 74d3dc78f0db by Victor Stinner in branch 'default': |
New changeset 513bab5cfb18 by Victor Stinner in branch 'default': New changeset 4c2135930882 by Victor Stinner in branch 'default': |
My stronger argument was that %.100s was never supported. I am wrong: Python 2 is only to truncate strings in PyErr_Format() (PyString_FromFormatV() supports precision for %s format). As I am the only one wanting to changing that, and I don't have any argument anymore, I accept to leave the code unchanged, but bpo-7330 must be fixed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: