Issue2690
Created on 2008-04-25 15:32 by belopolsky, last changed 2008-08-01 10:15 by ncoghlan.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| range-length.diff | belopolsky, 2008-04-25 15:31 | |||
| anyrange.patch | amaury.forgeotdarc, 2008-04-25 19:56 | |||
| range-sequence.diff | belopolsky, 2008-04-28 19:04 | patch against revision 62564 | ||
| Messages | |||
|---|---|---|---|
| msg65786 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-25 15:31 | |
Attached patch makes range objects precompute their length on creation. This speeds up indexing and len at the expense of a small increase in range object size. The main benefit, however is that unsupported length > sys.maxsize is detected early and confusing OverflowError from len(r) or r[i] is avoided. See discussion starting at http://mail.python.org/pipermail/python- 3000/2008-April/013225.html . |
|||
| msg65793 (view) | Author: Mark Dickinson (marketdickinson) | Date: 2008-04-25 16:55 | |
So with this patch, range(10**100) produces an OverflowError: is that
right?
I don't much like this aspect of the change: there are uses for
for i in range(ridiculously_large_number):
...
if condition_that_occurs_early_in_practice:
break
|
|||
| msg65798 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-25 17:32 | |
On Fri, Apr 25, 2008 at 12:55 PM, Mark Dickinson <report@bugs.python.org> wrote: .. > I don't much like this aspect of the change: there are uses for > > for i in range(ridiculously_large_number): For this application, I would use "for i in itertools.count():" instead. The only caveat is that while count() lets you specify the start, it does not provide for a step. If that is a problem, I would rather add step to count(). |
|||
| msg65802 (view) | Author: Mark Dickinson (marketdickinson) | Date: 2008-04-25 18:47 | |
I guess there needs to be a decision on whether to make range objects of length >= PY_SSIZE_T_MAX illegal; perhaps more discussion on python-dev would be worthwhile? I can see three options, besides leaving things as they are: (1) make large ranges illegal, as with this patch (2) make large ranges legal, but don't allow indexing with indices larger than PY_SSIZE_T_MAX. (3) allow large ranges *and* large indices. Option 3 seems to me like the ideal from the users' point of view, but I'm not sure whether it's easy/possible to implement it given that sq_item receives a Py_ssize_t for the index. Option 2 seems messy: half of one thing and half of the other, but I think it would be easy to implement. This is what I'd personally prefer if Option 3 isn't feasible. If Option 1 is indeed the preferred option, then the patch looks good to me, and works for me on OS X 10.5. (Minor nitpick: it introduces some extra tab characters.) Whatever happens, we probably also need a documentation update explaining the limitations on range. |
|||
| msg65803 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-25 19:23 | |
Option (3) would require changing both sq_item and sq_length signatures, which is likely to have a large negative impact on performance. Option (2) would either require a change for the sq_length signature, or leave the problem of having valid range objects for which applying len() would produce an OverflowError. What are the use cases for ranges with length greater than maxsize? Note that in 2.x all arguments to length are limited to 32 bit integers (even on 64-bit platforms) and the main reason to support long start/stop/step in 3.0 is because 2.x range() supports them. On the other hand, since 2.x range() produces lists, it is limited in length to a fraction of sys.maxsize. Therefore none of the current uses of either range or xrange require support of long length. |
|||
| msg65807 (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) | Date: 2008-04-25 19:56 | |
I am currently working on a patch that allows large ranges and large indices. The trick is to define tp_as_mapping->mp_subscript. Then range_item() is rarely used, only by functions calling directly the PySequence_* functions, instead of the abstract PyObject_*. There is still a limit with len(), which seems bound by the size_t limit. Most of the tests in test_builtin were re-enabled. I join the current version of the patch. I'm still working on further simplifications, and maybe supporting slices on ranges... Note: I found more useful to store a "range->end" member, which is the multiple of "step" just beyond the "stop" limit. |
|||
| msg65810 (view) | Author: Mark Dickinson (marketdickinson) | Date: 2008-04-25 20:37 | |
> What are the use cases for ranges with length greater than maxsize?
Yeah---I'm a bit short on use cases. The one that originally bit me with
Python 2.x was when I was doing a search for a quadratic non-residue
modulo a largeish prime;
for i in range(1, p):
if (i_is_a_nonresidue_modulo_p):
break
Here p might be a 200-digit prime number, and the situation is that half
the integers between 1 and p-1 are 'quadratic residues', while the other
half are 'quadratic nonresidues'; in practice the residues and
nonresidues are mixed up fairly well, so the first nonresidue shows up
pretty quickly, but there's no known small upper bound on when the first
nonresidue appears.
Of course, it's not hard to rewrite this with a while loop instead; it
would just be a bit annoying if that were necessary, when the code above
is so clear and direct, and the one obvious way to do it (TM).
I'd also note that it's not completely out of the question that something
like range(10**10) would be useful on a 32-bit machine: a long-running
process might easily go through 10**10 iterations of something.
I agree it's a bit strange to have a semi-functional range object, that
you can iterate over but not take the length of.
|
|||
| msg65812 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-25 20:57 | |
On Fri, Apr 25, 2008 at 4:37 PM, Mark Dickinson <report@bugs.python.org> wrote: .. > for i in range(1, p): > if (i_is_a_nonresidue_modulo_p): > break > > Here p might be a 200-digit prime number, and the situation is that half > the integers between 1 and p-1 are 'quadratic residues', while the other > half are 'quadratic nonresidues'; in practice the residues and > nonresidues are mixed up fairly well, so the first nonresidue shows up > pretty quickly, but there's no known small upper bound on when the first > nonresidue appears. Hmm, AFAIKT there is always at least one non-residue between 1 and p and therefore you can just write for i in itertools.count(1): if (i_is_a_nonresidue_modulo_p): break maybe with an additional check for p > 1. |
|||
| msg65825 (view) | Author: Mark Dickinson (marketdickinson) | Date: 2008-04-25 23:16 | |
> Hmm, AFAIKT there is always at least one non-residue between 1 and p > and therefore you can just write > > for i in itertools.count(1): > if (i_is_a_nonresidue_modulo_p): > break > > maybe with an additional check for p > 1. Sure. It's just uglier that way. :-) And I feel it would be mildly annoying not to be able to use the obvious tool for the job, for subtle reasons. It's also a potential source of bugs: one might write such code using range and only discover later that it fails unexpectedly for large inputs. These really aren't serious objections---just mild preferences. I'll stop being disruptive now :) |
|||
| msg65835 (view) | Author: Nick Coghlan (ncoghlan) | Date: 2008-04-26 08:16 | |
Given that range() produced a list in the 2.x series (hence limited to available memory), and xrange() uses int internally for its values (and hence couldn't even cope with short ranges with values greater than sys.maxint). So my preference is to mimic the 2.x range's behaviour in this case by raising an overflow error if the sequence is too long. (From Python 2.5.1) >>> range(2**99, 2**100) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: range() result has too many items >>> range(2**99, 2**99+5) [633825300114114700748351602688L, 633825300114114700748351602689L, 633825300114114700748351602690L, 633825300114114700748351602691L, 633825300114114700748351602692L] >>> xrange(2**99, 2**99+5) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to int |
|||
| msg65926 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-28 19:03 | |
I've implemented range slicing and x in range(..) in range-sequence.diff and registered range with the Sequence ABC. |
|||
| msg65937 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-28 21:47 | |
Reviewing my own patch (range-sequence.diff), I've realized that it is being a bit too clever in handling x in range(..) where x is not an integer. It seems that upon a failed PyLong_Check, range_contains should just do a linear search. This is easy to implement, but I will wait for more feedback before posting further changes. |
|||
| msg65942 (view) | Author: Facundo Batista (facundobatista) | Date: 2008-04-28 23:21 | |
My 2 cents: for me is more useful to have a unbound range() than to be able to do a len() on it. For me, range() should mimic a number generator: no limit, no length. |
|||
| msg65956 (view) | Author: Alexander Belopolsky (belopolsky) | Date: 2008-04-29 04:38 | |
> For me, range() should mimic a number generator: no limit, no length. That's itertools.count() |
|||
| msg65963 (view) | Author: Nick Coghlan (ncoghlan) | Date: 2008-04-29 11:40 | |
It also isn't what range() and xrange() are used for now in 2.x. range() returns an actual list, hence is limited to sequences that fit in a reasonable amount of memory, and xrange() doesn't support values greater than sys.maxint at all (as it uses C ints for its internal storage of the start, stop and step values). With itertools.count() available for the unbounded iterator case, I think making range() mimic its 2.x counterpart as closely as possible (without the memory inefficiency) will be quite valuable. |
|||
| msg65981 (view) | Author: Facundo Batista (facundobatista) | Date: 2008-04-29 21:05 | |
Fair enough, specially if in the documentation of range() we put "if you want a unbound, no limit, number generator, use itertools.count()" (or something well written in english ;) ). Thanks! |
|||
| msg69884 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-07-17 16:00 | |
Has a resolution been made on this? |
|||
| msg70525 (view) | Author: Guido van Rossum (gvanrossum) | Date: 2008-07-31 17:39 | |
On the issue of whether len(range()) should be allowed to be > sys.maxsize, I think this should be allowed. I think in the future we should change the __len__ protocol to allow unbounded lengths. Even today, I think range(10**100).__len__() should return 10**100 rather than raising an OverflowError, even if len(range(10**100)) raises OverflowError. I also think ranges should be introspectable, exposing their start, stop and step values just like slice objects. Probably all those changes are for post 3.0. |
|||
| msg70548 (view) | Author: Nick Coghlan (ncoghlan) | Date: 2008-08-01 10:15 | |
Guido, does that mean we can apply this patch to get the cleaner list-like behaviour for 3.0, and then work on the sq_item/sq_len fixes for 3.1 as a separate issue? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008-08-01 10:15:33 | ncoghlan | set | messages: + msg70548 |
| 2008-07-31 17:39:31 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg70525 |
| 2008-07-17 16:00:29 | pitrou | set | nosy:
+ pitrou messages: + msg69884 |
| 2008-04-29 21:06:03 | facundobatista | set | messages: + msg65981 |
| 2008-04-29 11:40:19 | ncoghlan | set | messages: + msg65963 |
| 2008-04-29 04:38:26 | belopolsky | set | messages: + msg65956 |
| 2008-04-28 23:21:54 | facundobatista | set | nosy:
+ facundobatista messages: + msg65942 |
| 2008-04-28 21:47:57 | belopolsky | set | messages: + msg65937 |
| 2008-04-28 19:04:43 | belopolsky | set | files:
+ range-sequence.diff messages: + msg65926 |
| 2008-04-26 08:16:41 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg65835 |
| 2008-04-25 23:16:43 | marketdickinson | set | messages: + msg65825 |
| 2008-04-25 20:57:44 | belopolsky | set | messages: + msg65812 |
| 2008-04-25 20:37:35 | marketdickinson | set | messages: + msg65810 |
| 2008-04-25 19:57:02 | amaury.forgeotdarc | set | files:
+ anyrange.patch nosy: + amaury.forgeotdarc messages: + msg65807 |
| 2008-04-25 19:24:01 | belopolsky | set | messages: + msg65803 |
| 2008-04-25 18:48:01 | marketdickinson | set | messages: + msg65802 |
| 2008-04-25 17:32:07 | belopolsky | set | messages: + msg65798 |
| 2008-04-25 16:55:19 | marketdickinson | set | nosy:
+ marketdickinson messages: + msg65793 |
| 2008-04-25 15:32:00 | belopolsky | create | |