Issue 1003935: xrange overflows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40693

classification

Title:	xrange overflows
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	connelly, hfuru, loewis, rhettinger, tim.peters
Priority:	normal	Keywords:

Created on 2004-08-05 13:16 by hfuru, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (14)
msg54215 - (view)	Author: Hallvard B Furuseth (hfuru)	Date: 2004-08-05 13:16
These restrictions are undocumented both in the xrange doc string and in the reference manual (Info node 'XRange Type'): >>> xrange(maxint, maxint + 10) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: long int too large to convert to int >>> xrange(-100, maxint) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: xrange() result has too many items I hope the overflows below are bugs and not features. It works if 3/-3 is replaced with 1/-1: >>> xrange(0, maxint, 3) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer addition >>> xrange(0, -maxint, -3) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer addition Python installation: Python 2.3.3 (#1, May 25 2004, 20:22:36) [GCC 3.2.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> from sys import maxint >>> "%x" % maxint '7fffffff'
msg54216 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-08-05 13:49
Logged In: YES user_id=80475 Do you have a real use case for this? Do any real apps need to loop over more than sys.maxint integers? It would be ashamed to muckup the high performance implementation for something that does not arise in practice.
msg54217 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2004-08-05 15:12
Logged In: YES user_id=21627 The OverflowErrors are definitely intentional, and by design. xrange() is documented as working the same way as range(), so any error in the documentation is an error in the range() documentation. Reclassifying this as a documentation bug.
msg54218 - (view)	Author: Hallvard B Furuseth (hfuru)	Date: 2004-08-05 15:14
Logged In: YES user_id=726647 > Do you have a real use case for this? For the 'hopefully bugs' variants, yes: #1: Loop forever: for i in xrange(x, sys.maxint, y) That's a lot faster than i = x while True: ... i += y #2: 'loop until optional upper bound': def some_loop(start, end = sys.maxint): for i in xrange(start, end, whatever()) > Do any real apps need to loop over more than > sys.maxint integers? The answer may be yes nowadays. Even my old Solaris can find primes up to maxint/2 in just 2 hours. That's a loop over maxint/4 integers. Though the remaining 3/4 come slower:-) Still, I expect variants of the above code would be less uncommon, like some_loop(-100). > It would be ashamed to muckup the high > performance implementation for something that > does not arise in practice. I hope you do not mean xrange(0, maxint, 3). If you mean xrange(-100, maxint): Maybe xrange could be split in several types (fast and slower) and the xrange() operator would return one of these, similar to how int() now can return long?
msg54219 - (view)	Author: Hallvard B Furuseth (hfuru)	Date: 2004-08-05 16:42
Logged In: YES user_id=726647 The only reason I can see that xrange(0, maxint, 3) gives integer overflow is that repr() returns 'xrange(first, last + step, step)', where last + step would be too large. I suggest repr() is changed to return 'xrange(first, last + step/abs(step), step)'.
msg54220 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-08-06 05:10
Logged In: YES user_id=31435 It looks like there are two bugs here. One is that the "integer addition" detail doesn't make sense, since the user isn't doing any integer addition here (sorry, no, repr() is irrelevant to this). Second, it shouldn't be complaining in the last two cases at alll. If the numbers truly were out of range, then rangeobject.c's range_new() would have raised a "too many items" exception. Note: >>> from sys import maxint as m >>> xrange(0, m, 2) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer addition >>> xrange(-m, m, 2) xrange(-2147483647, 2147483647, 2) >>> The second xrange() there contains twice as many items as the first one, but doesn't complain. It's code in PyRange_New () that's making the bogus complaint, and I can't figure out what it thinks it's doing. The code in get_len_of_range() is correct. The code in PyRange_New() is both overly permissive (e.g., it silently lets "(len - 1) * step" overflow), and overly restrictive (e.g, I can't see why it should matter if "last > (PyInt_GetMax () - step))" -- the only thing in that specific branch that should matter is whether the integer addition "start + (len - 1) * step" overflowed (which it isn't checking for correctly, even assuming the multiplication didn't overflow). The obvious fix for xrange() is to speed range_new() by throwing away its call to the broken PyRange_New(). range_new() is already doing a precise job of checking for "too big", and already knows everything it needs to construct the right rangeobject. That would leave the PyRange_New() API call with broken overflow checking, but it's not called from anywhere else in the core.
msg54221 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-08-08 07:20
Logged In: YES user_id=31435 Changed range_new() to stop using PyRange_New(). This fixes a variety of bogus errors. Documented that xrange() is intended to be simple and fast, and that CPython restricts its arguments, and length of its result sequence, to native C longs. Added some tests that failed before the patch, and repaired a test that relied on a bogus OverflowError getting raised. Doc/lib/libfuncs.tex; new revision: 1.171 Lib/test/test_xrange.py; new revision: 1.2 Objects/rangeobject.c; new revision: 2.53
msg54222 - (view)	Author: Hallvard B Furuseth (hfuru)	Date: 2004-08-08 18:09
Logged In: YES user_id=726647 Here is why repr() is relevant - though the error message was certainly weird. With the latest CVS version: >>> xrange(maxint-1, maxint, 2) xrange(2147483646, -2147483648, 2) Which is why I suggested to return last+step/abs(step) as the 2nd argument.
msg54223 - (view)	Author: Connelly (connelly)	Date: 2004-12-06 07:04
Logged In: YES user_id=1039782 Yes, I run into this bug all the time. For example, I may want to generate all strings of length n: BINARY_CHARSET = ''.join([chr(i) for i in range(256)]) def all_strs(n, charset=BINARY_CHARSET): m = len(charset) for i in xrange(mn): yield ''.join([charset[i//mj%m] for j in range(n)]) This is correct algorithmically, but fails due to the buggy Python xrange() function. So I end up writing my own irange () function. Other cases where I've run into this problem: Sieving for primes starting at a given large prime, checking for integer solutions to an equation starting with a large integer, and searching very large integer sets. itertools.count and itertools.repeat are similarly annoying. Often I want to search the list of all positive integers starting with 1, and have to use an awkward while loop to do this, or write my own replacement for itertools.count. There should be little performance hit for apps which use xrange(n), where n fits in the system's integer size. xrange () can just return an iterator which returns system ints, and the only performance penalty is an extra if statement in the call to xrange (and there is no performance penalty in the iterator's next() method). Finally, this move appears consistent with Python's values, ie simplicity, long ints supported with no extra effort, avoid gotchas for newbies, no special cases, etc.
msg54224 - (view)	Author: Tim Peters (tim.peters) *	Date: 2005-07-10 21:19
Logged In: YES user_id=31435 Unassigned myself, since I don't plan to do anything else here.
msg54225 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2005-07-10 23:11
Logged In: YES user_id=80475 Will look at it to see if there is something simple that can be done. These were design decisions -- xrange() and count() are supposed to be simple and fast -- with other tools being preferred if you are pushing beyond the limit of normal use cases. IOW, it is not worth crippling their performance just because someone has discovered cute ways of using sys.maxint.
msg54226 - (view)	Author: Hallvard B Furuseth (hfuru)	Date: 2005-07-11 10:16
Logged In: YES user_id=726647 What is "cute" about using maxint as an xrange() limit, and why would it impact its performance to return its 2nd parameter as endvalue +/- 1 (an int) instead of trying to return endvalue + step (which can be long)?
msg54227 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2005-07-11 22:24
Logged In: YES user_id=21627 Using xrange for an infinite loop qualifies as "cute" = "obviously straining for effect". The most natural way of an infinite loop is "while True". There are certainly other ways to express an infinite loop (like reading from /dev/zero, or writing an unbounded recursion), but arguing that xrange is "much faster" is obviously straining for effect: If the loop is infinite, how can it be much faster? And why does it matter if it is? (in my measurements, it is 30% faster, counting the time for a given number of iterations).
msg68592 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-06-22 22:51
This was fixed for Py3.0. No interest was shown in backporting.

History
Date	User	Action	Args
2022-04-11 14:56:06	admin	set	github: 40693
2008-06-22 22:51:15	rhettinger	set	status: open -> closed resolution: out of date messages: + msg68592
2004-08-05 13:16:50	hfuru	create