Inconsistency in overflow error messages of integer argument #60192

serhiy-storchaka · 2012-09-20T18:51:10Z

BPO	15988
Nosy	@terryjreedy, @mdickinson, @ezio-melotti, @serhiy-storchaka, @orenmn
PRs	bpo-15988: make various overflow messages more consistent and helpful #668
Dependencies	bpo-15989: Possible integer overflow of PyLong_AsLong() results bpo-28298: can't set big int-like objects to items in array 'Q', 'L' and 'I' bpo-29816: Get rid of C limitation for shift count in right shift
Files	issue15988_ver1.diff: proposed patches diff file - ver1 testPatches.py: an ugly script that tests my patches PyLong_As_and_PyArg_Parse_patchVer1.diff formats_patchVer1.diff array_patchVer1.diff hashlib_lzma_pickle_patchVer1.diff time_and_re_patchVer1.diff curses_stat_callproc_abstract_patchVer1.diff mmap_posix_socket_select_patchVer1.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = None
created_at = <Date 2012-09-20.18:51:09.840>
labels = ['interpreter-core', 'type-feature', '3.7']
title = 'Inconsistency in overflow error messages of integer argument'
updated_at = <Date 2017-09-28.06:45:38.532>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2017-09-28.06:45:38.532>
actor = 'Oren Milman'
assignee = 'serhiy.storchaka'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2012-09-20.18:51:09.840>
creator = 'serhiy.storchaka'
dependencies = ['15989', '28298', '29816']
files = ['45107', '45108', '46733', '46736', '46738', '46739', '46740', '46741', '46742']
hgrepos = []
issue_num = 15988
keywords = ['patch']
message_count = 29.0
messages = ['170827', '170950', '170979', '219609', '270474', '270478', '270479', '278738', '278742', '289337', '289355', '289653', '289679', '289684', '289691', '289693', '289695', '289799', '289814', '289833', '289842', '289843', '289846', '289850', '289852', '289853', '299735', '300218', '303192']
nosy_count = 5.0
nosy_names = ['terry.reedy', 'mark.dickinson', 'ezio.melotti', 'serhiy.storchaka', 'Oren Milman']
pr_nums = ['668']
priority = 'low'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue15988'
versions = ['Python 3.7']

serhiy-storchaka · 2012-09-20T18:51:09Z

PyArg_ParseTuple raises inconsistent overflow error messages for small integer formats. For example:

>>> import _testcapi
>>> _testcapi.getargs_b(100)
100
>>> _testcapi.getargs_b(1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: unsigned byte integer is greater than maximum
>>> _testcapi.getargs_b(-1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: unsigned byte integer is less than minimum
>>> _testcapi.getargs_b(100000000000000000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> _testcapi.getargs_b(-100000000000000000000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

On platforms with 32-bit int and 64-bit long there will be more such cases.

terryjreedy · 2012-09-22T01:40:54Z

The reason for the different messages is clear -- there are different internal tests (I presume in different functions), each of which raises its own error. What do you propose to change?

The only thing I can imagine is that when the bytes converter calls the Python int to c long function, it somehow intercepts the error return and rewrites the message for consistency. Is this your proposal?

serhiy-storchaka · 2012-09-22T09:42:59Z

Use PyLong_AsLongAndOverflow instead PyLong_AsLong in PyArg_ParseTuple and in some other functions.
Unify an integer overflow error messages in different functions.

terryjreedy · 2014-06-02T17:27:47Z

I reconsidered this in the light of bpo-21559. getargs_b requires an integer of type int in range(256). A non-int properly raises TypeError.

>>> from _testcapi import getargs_b as gb
>>> gb(1.0)
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    gb(1.0)
TypeError: integer argument expected, got float
>>> import fractions
>>> gb(fractions.Fraction(1, 1))
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    gb(fractions.Fraction(1, 1))
TypeError: an integer is required (got type Fraction)

An out-of-range int should, it seems to me, just raise ValueError("int %d not in range(256)" % n). Verification of the range:

>>> gb(255)
255
>>> gb(256)
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    gb(256)
OverflowError: unsigned byte integer is greater than maximum
>>> gb(0)
0
>>> gb(-1)
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    gb(-1)
OverflowError: unsigned byte integer is less than minimum

The last message is wrong or contradictory. An unsigned (non-negative) int cannot be less than 0.

orenmn · 2016-07-15T09:57:48Z

I would be happy to write a patch for this issue, if you don't mind.

Terry, as you were the one to commit the patch for bpo-21559, I guess you are OK with keeping the OverflowError.
Also, I agree that the error message for '_testcapi.getargs_b(-1)' (and for similar cases) could be improved. IMHO, the wording in PyLong_AsLong is quite clear, and also fits here, e.g. 'Python int too small to convert to C unsigned byte' for '_testcapi.getargs_b(-1)'.

ISTM that implementing the patch by doing the following is quite straightforward:

replacing PyLong_AsLong with PyLong_AsLongAndOverflow (as Serhiy suggested) in cases 'b', 'h' and 'i', in convertsimple in getargs.c
changing some error messages in convertsimple
changing PyLong_AsLong and _PyLong_AsInt so that they too would give more helpful error messages (according to the long-lived suggestions of '/* XXX: could be cute and give a different message for overflow == -1 */')

Serhiy, what did you mean by 'Unify an integer overflow error messages in different functions.'?

serhiy-storchaka · 2016-07-15T11:31:56Z

You plan looks good, Oren.

I meant what you are planning in the last two items. There are also specific functions for converting to platform-depending integer types (size_t, time_t, uid_t, etc). Two most common patterns: "Python int too large to convert to C long" and "signed short integer is less/greater than maximum". Grep all code for raising OverflowError.

terryjreedy · 2016-07-15T12:00:57Z

At this point in time, I have nothing further to add. So follow Serhiy's advice.

orenmn · 2016-10-15T23:43:43Z

Sorry for taking so long to come up with a patch..

------------ proposed changes ------------
1. in Python/getargs.c in seterror, the following patches:
- add a parameter 'p_format_code' - a pointer to the format code
whose conversion failed.
- replace the error message of the OverflowError with
'{fname}() argument out of range' in case all of the following are
true:
* PyErr_Occurred() and the exception is an OverflowError.
* The PyArg_* function received a format ending with a ':'
(i.e. seterror's fname argument is not NULL).
* The error occurred during a conversion to a C integer type
which might overflow, i.e. one of the format codes 'bhilLn'.

  With these patches, inconsistent messages could often be fixed by merely
  appending ':<fname>' to the format argument in a call to a PyArg_*
  function.

  Furthermore, Some inconsistent messages are actually fixed by these
  patches alone.
  That is because there are already many calls to PyArg_* functions with a
  format ending with ':<fname>'. Some of them are followed by more
  checks of the parsed arguments, which might result in raising an
  OverflowError/ValueError with an inconsistent error message.
  (e.g. in [Modules/itertoolsmodule.c](https://github.com/python/cpython/blob/main/Modules/itertoolsmodule.c), product_new already calls
  PyArg_ParseTupleAndKeywords with the format "|n:product", so with these
  patches, in case we do 'itertools.product('a', repeat=1 << 1000)', the
  error message would be 'product() argument out of range').

  Also, ISTM this patch is helpful, regardless of this issue (bpo-15988).
  In case a PyArg_* function raises an OverflowError (because a conversion
  to some C integer type failed), knowing which function called the
  PyArg_* function would probably prove more helpful to a user (than
  knowing which C type Python failed to convert to, and whether the
  conversion failed because the number to convert was too large or too
  small).

  I decided to put the patch inside seterror, because (aside from its
  name,) it is always called after a failure of convertitem (which is the
  only caller of convertsimple).

2. in various files:
    - change some OverflowError/ValueError messages for more clarity, e.g.
      replace 'unsigned byte integer is greater than maximum' with
      'Python int too large to convert to C unsigned char'.
    - add code to "catch" OverflowError/ValueError errors, to prevent
      raising inconsistent error messages
    - append ':<fname>' to formats passed to PyArg_* functions, to utilize
      the first proposed change (to automatically "catch"
      OverflowError/ValueError errors).

3. in [Lib/tests](https://github.com/python/cpython/blob/main/Lib/tests) - I was already writing tests for my patches, so I guessed
  I should add some of them (the basic boundary checks) to [Lib/tests](https://github.com/python/cpython/blob/main/Lib/tests) (in
  case they didn't already exist). I ran into some issues here:
    - test_pickle - I didn't manage to create an int large enough to test
      my patch for save_long in [Modules/_pickle.c](https://github.com/python/cpython/blob/main/Modules/_pickle.c) (and so I didn't add any
      test to test_pickle).
    - test_stat - I didn't find a way to determine the size of mode_t on
      the current platform, except for Windows (so the test I added is
      incomprehensive).

4. in [Objects/longobject.c](https://github.com/python/cpython/blob/main/Objects/longobject.c), make some messages more helpful (as mentioned
  in the last message I posted here).

------------ diff ------------
The proposed patches diff file is attached.

------------ tests ------------
I wrote an ugly script to test my patches, and ran it on my Windows 10 and
Ubuntu 16.04 64-bit VM. The script is attached, but it might fail on another
platform.

Note that the script I wrote has (among others) a test for
'select.devpoll'. I couldn't run it, as I don't have a Solaris machine,
but the test is identical to that of 'select.poll', so it should run
smoothly on a Solaris machine. Theoretically.
(Each test in my script verifies the exception's error message, and not only
which exception was raised, so ISTM these kind of tests don't fit in
Lib/tests. Please let me know if you think otherwise.)

In addition, I ran 'python_d.exe -m test -j0' (on my 64-bit Windows 10 and
Ubuntu VM) with and without the patches, and got quite the same output. (That
also means my new tests in various files in Lib/tests have passed.)

##############################################################################

Note that the inconsistent messages my patches fix are only those I found
while going over each 'PyExc_OverflowError' in the codebase (in *.h and *.c
files).
However, I would probably find another bunch of inconsistent messages when I
go over each 'PyExc_ValueError' in the codebase.

I would be happy to do that, but I am afraid I won't have time to (at least)
until next June.

serhiy-storchaka · 2016-10-16T06:08:34Z

Thank you Oren. I'll make a detailed review in next few days.

serhiy-storchaka · 2017-03-10T06:24:42Z

The patch no longer applied cleanly (in particular to arraymodule.c). Oren, could you please update it?

orenmn · 2017-03-10T11:33:20Z

of course :)

but some time has passed, so i would re-check stuff, and run tests
etc., so it would probably take me some time.

serhiy-storchaka · 2017-03-15T09:07:17Z

I just have started reviewing the PR and it looks great!

serhiy-storchaka · 2017-03-15T16:00:21Z

I think that raising OverflowError exceptions should be avoided if this is possible. If some function requires non-negative integer and raises ValueError for small negative integers and OverflowError for large negative integers, it is desirable to make it raising ValueError for all negative integers. If some function truncates integer arguments to specified limit it shouldn't raise OverflowError for large positive integers, but truncate them to that limit.

orenmn · 2017-03-15T18:03:34Z

now that you opened bpo-29816 and bpo-29819, should I remove from the PR changes in the following functions, and related tests?

_io_BytesIO_truncate_impl
_io_StringIO_truncate_impl
long_rshift
long_lshift

orenmn · 2017-03-15T20:27:21Z

also, IMHO, we should open an issue named 'raise ValueError instead of OverflowError when a negative int is invalid', and leave for that
new issue some the changes (which you commented about in the PR).

I didn't want to open it without your approval.
here is what I thought should be the contents of the issue (feel
free to edit my draft as you wish and open the issue yourself,
if you think we should open it):

In various C functions, a negative int argument is invalid. in most of these
functions, ValueError is raised for a negative int, or at least for a small
negative int (i.e. a negative int that fits in some signed integer C
type).

as Serhiy mentioned in http://bugs.python.org/issue15988#msg289679 and in
code review comments in #668, it might
be that a ValueError should always be raised in such functions, upon receiving
a negative int (even upon receiving a big negative int, such as (-1 << 1000)).

the following functions were identified as relevant only while working on
bpo-15988, so there are probably (a lot?) more relevant functions.

functions where OverflowError is raised only for big negative ints:
- Objects/bytesobject.c - bytes_new
- Objects/bytearrayobject.c - bytearray_init
- functions that would be fixed as part of bpo-29819?
* Modules/_io/stringio.c - _io_StringIO_truncate_impl
* Modules/_io/bytesio.c - _io_BytesIO_truncate_impl

functions where OverflowError is raised for any negative int (so a change would
probably break some users code):
- Objects/longobject.c:
* PyLong_AsUnsignedLong
* PyLong_AsSize_t
- Modules/_ctypes/_ctypes.c - PyCArrayType_new

Note: while patching these functions, one should also make sure that the
patched functions produce consistent error messages, as explained in bpo-15988
(in which it was decided to leave these functions (which currently produce
inconsistent error messages) to this issue.)

serhiy-storchaka · 2017-03-15T20:39:32Z

I agree. It would be better to remove from the PR changes that can be covered by bpo-29816, bpo-29819 and other changes that changes visible behavior and open one or several new issues for them. Maybe one meta-issue and few concrete issues?

Such methods as read() also shouldn't raise OverflowError for negative argument.

orenmn · 2017-03-15T20:45:07Z

I am sorry, but my 2nd term starts on Monday, so I am short on time,
and feel like I would be lucky to even finish working on PR 668.

because of that, and because you obviously have a much better
understanding of this issue than me, I would be happy if you could open
the new issues.

serhiy-storchaka · 2017-03-18T09:11:28Z

it is hard to discuss such large patch on GitHub since diffs and comments to all files are shown on one page. Rietveld is more appropriate since it shows only one file at the time. I suggest you Oren to provide the next version of your patch as a patch for reviewing on Rietveld and make PRs only for ready patches.

Even after withdrawing some changes or extracting them as separate issues the patch can be too large. I suggest to commit first the part that relates to PyArg_Parse* ang PyLong_As*.

orenmn · 2017-03-18T14:19:41Z

Here is a patch including only changes related to PyLong_As* and PyArg_Parse*
functions.