classification
Title: "IndexError: tuple index out of range" should include the requested index and tuple length
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Add index attribute to IndexError
View: 18162
Assigned To: rhettinger Nosy List: BreamoreBoy, berker.peksag, cool-RR, eric.araujo, ezio.melotti, josh.r, r.david.murray, rhettinger, scoder, serhiy.storchaka, terry.reedy
Priority: low Keywords:

Created on 2014-07-03 12:33 by cool-RR, last changed 2014-07-09 06:52 by cool-RR. This issue is now closed.

Messages (21)
msg222168 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 12:33
Ditto for lists and any other place this could be applicable.
msg222216 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-07-03 21:21
Why?  Is there any known use case?

The IndexError exception is commonly used for control flow.  Slowing down the instantiation to add an index that no one really needs would be a waste.   This exception has been around for 20+ years -- if they were an actual need, we would have known by now.  To my eyes, this appears to be gratuitous feature creep.
msg222222 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 21:39
Raymond: I do take your point about performance, and I understand that if this results in a performance problem, then that's a good argument to not include this feature.

But I'm baffled as to why you're asking me regarding this feature "Why? Is there any known use case?" 

Why do we have an exception text like "TypeError: f() takes 1 positional argument but 3 were given" instead of "TypeError: f() takes a different number of arguments than you tried to give it"? Why do we have "ImportError: No module named 'foobas'" instead of "ImportError: No such module"? Do I need to spell out the reason and all the different scenario in which these exception text are super useful?

Why does an operation like `{}[42]` result in `KeyError: 42` instead of `KeyError: Key doesn't exist"? And now that I think about it, KeyError is used for control flow too, so why is it okay for it to contain the value, but on `IndexError` it's a no-no?
msg222230 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-03 22:19
TypeError also should be more specific because it can occur for a multitude of reasons; along with stuff like AttributeError, it's one of those exceptions that could arise from multiple causes on a single line of code, none of them obvious. For the specific case of passing the wrong number of arguments, that's usually a result of programmer error, not input errors to a valid program, so it's not a case worth optimizing. For control flow uses of TypeError (e.g. using duck typing to choose code paths), performance loss is a valid point.

ImportError requires disambiguation for similar reasons (since a single import statement could error because the target module is missing, or because one of its dependencies fails to import; you need to be able to tell the difference, and a line number doesn't help). Beyond that, there is little cost to making fully detailed ImportErrors; if you're risking ImportErrors in your program's hot paths, something is wrong with your design.

As for needing a use case: Every feature starts at minus 100 points (ref: http://blogs.msdn.com/b/ericgu/archive/2004/01/12/57985.aspx ). There is a limited amount of development and testing resources, more code means more opportunity for errors and increased disk and memory usage, etc.

I agree that KeyError is a relevant comparison, though you'd be surprised how much cheaper indexing a sequence is relative to dictionary access; hashing and collision chaining are usually tripling or quadrupling the work relative to a simple integer lookup in a sequence. The more expensive the non-exceptional operation, the less you need to worry about the expense of the exceptional case, simply because the code was never going to run that quickly anyway.
msg222232 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 22:30
Josh... The reason I gave all these examples of where Python gives detailed error messages, is not so you'd explain the obvious reason, which is that it makes it easier to debug, figure out what's wrong with your program, and fix it. The reason I gave these examples is that I'm baffled by this attitude that I see all the time on python-dev, where people are asking me questions when the answers are obvious to all of us.

I suggested adding more information to the "IndexError: tuple index out of range", like "IndexError: tuple only has 4 elements, can't access element number 7." It should be obvious to any programming novice why this is helpful: Because it makes it much easier to figure out what your program is doing wrong, so you could fix it. (I feel silly having to spell it out in company of so many experienced developers who all understand this basic fact.)

I'm very frustrated that a core Python developer would ask me "Why?  Is there any known use case?" on such a no-brainer suggestion.

Now, if you want to make the performance argument, that's acceptable. But it's very, very frustrating that people on python-dev, who are very experienced developers, need to be explained the virtue of very simple and obvious features. (This happens many times, not just on this ticket.) I'm baffled as to why they do this.
msg222235 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-03 22:47
If a programmer can't work out from "IndexError: tuple index out of range" what is going on they should give up programming.  Personally I'd close this now with resolution "complete waste of core dev time".
msg222236 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 22:51
Mark, again I'm finding myself saying things that are obvious to all of us: You can figure out that "tuple index out of range" means you asked for an item bigger than the size of the tuple, but it might be very helpful for debugging to say the number of item that you asked for and the size of the tuple. For example, maybe it'll say "IndexError: tuple only has 0 elements, can't access element number 1" and you'd be like, "hey, this tuple is empty, it's supposed to have stuff, so the bug is that it's empty", or alternatively it might say "IndexError: tuple only has 10 elements, can't access element number 732426563" and then you'd say "oh, there's a bug in the code that says which number of item I want, this number is very likely wrong for my use case".

Was the above paragraph not quite obvious? Can't we all think of many different examples where you'd want to have that information? Why do I really have to go over these things?
msg222237 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-03 22:55
Ram I won't be making any more comments as it's quite clear to me that you have no empathy at all with the core devs.
msg222238 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-03 22:58
As I said, the main reason is that every feature has to start at minus 100 points. It's not that your idea is bad, it's that it has to be sufficiently good to warrant the risks that come with any code churn, no matter how small. "Simple and obvious" does not mean "easy and risk free". I'll admit, aside from the performance concerns, this would be a relatively easy change; but "relatively easy" and "relatively safe" still means "using resources that could go towards other features" and "potentially dangerous".

For debugging, you always have the option of wrapping the broken code in try/except and logging the values you're interested in (with or without reraising). If you believe that's insufficient, please, submit a patch (with tests), or find someone who is willing to do so. Otherwise, you have to accept that other people don't always share your beliefs about what is worth their time to improve; telling them they're wrong for disagreeing doesn't help.
msg222239 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 23:08
Josh, I agree with most of what you're saying. (Except the tips about debugging are not helpful, the point is to get the information as quickly as possible without having to make code modifications if possible.)

I can totally understand a reaction of "Your idea is helpful because it makes it easier to debug problems in your code quickly, which is a very important thing, but we prefer not to implement it because it'll introduce a considerable performance penalty." (Or use the -100 points argument which is depressing but valid.) 

But when people (like Raymond and Mark here) are not even acknowledging the obvious advantages of the suggestion before shooting it down, that is really frustrating, and next time someone on python-dev writes a blog post "why don't more people contribute to CPython", they should be looking at behavior like this.
msg222240 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-03 23:09
The development team is not monolithic, and we are all people, with differing opinions (and Mark is Mark).  As has been pointed out, there is people-load for development and maintenance associated with any change, so a mature project has a natural tendency toward conservatism.

That said, we do improve error messages.  The important point is the one Raymond made originally.  Perhaps someone will be interested enough to develop a patch and produce some benchmarks.  As a low priority item it is unlikely any of the core team will do so.
msg222242 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-03 23:13
David, as a more generalized solution: Do you think it's possible to make some kind of mechanism in Python that would change the way that an exception is constructed based on whether it's used for control flow or not? I know that it's a bit far-fetched, but if we could figure out a way to do that, it'll free us from having to serve two masters at the same time (one of them being clear error messages, the other being fast times to create an exception.)

That way we could make the exceptions have very helpful messages when a person will see them, but keep them fast when a person won't. It's a shot in the dark but if someone has an idea for how to do it, that'd be cool. Another possibility is to make the -O flag do this switch, though there are problems with that too.
msg222333 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-07-05 02:28
I have several times found exception messages to be under-informative, and I am pretty sure this is one of them. The obvious use case is so I don't have to insert a print statement, which I can only do if the error is in Python code, to get the essential info that Python could have told me but didn't.

For for loops, StopIteration has mostly taken over the flow control job that IndexError did and only occasionally still does.
msg222334 - (view) Author: √Čric Araujo (eric.araujo) * (Python committer) Date: 2014-07-05 03:20
Mark, could you please not phrase your messages as if you were speaking for the whole core team, and be more friendly with other contributors (or reply less)?
msg222337 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-05 06:27
Ram, do you want to provide a patch and benchmarks?
msg222338 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2014-07-05 06:41
See also issue 18162.
msg222346 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-05 10:32
obably Serhiy: Unfortunately I don't program in C, so I can't implement this.
msg222352 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-07-05 13:14
The feature request sounds reasonable to me, unless someone proves that there are major (performance) issues.  However, since this has already been reported in #18162, I'm going to close it as a duplicate.

@Raymond
> The IndexError exception is commonly used for control flow.

Can you provide an example?  IME I rarely catch IndexErrors, and I usually use LBYL before accessing a random index.

> Slowing down the instantiation to add an index that no one really needs
> would be a waste.   This exception has been around for 20+ years -- if
> they were an actual need, we would have known by now.

I'm not sure the cost of adding the index is comparable with the cost of the whole instantiation.  Regarding the request itself see #1534607 and #18162.

@Ram
> Another possibility is to make the -O flag do this switch,
> though there are problems with that too.

-1

> Unfortunately I don't program in C, so I can't implement this.

It might be easier than you think.  You just need to find where the exception is defined and see what other exceptions like KeyError do.
Then you copy the code and adjust it until it doesn't segfault anymore and the tests pass.
msg222364 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-07-05 14:12
"you'd be surprised how much cheaper indexing a sequence is relative to dictionary access"

This is a bit off-topic (and I realise that this ticket is closed now), but the difference isn't really all that large:

$ python3.4 -m timeit -s 'seq = list(range(1000)); d = {n:n for n in seq}' 'seq[100]'
10000000 loops, best of 3: 0.0263 usec per loop

$ python3.4 -m timeit -s 'seq = list(range(1000)); d = {n:n for n in seq}' 'd[100]'
10000000 loops, best of 3: 0.0285 usec per Pool

$ python3.4 -m timeit -s 'seq = list(range(1000)); d = {"test%d"%n:n for n in seq}' 'd["test34"]'
10000000 loops, best of 3: 0.0317 usec per loop

Especially hashing strings is usually faster than you might expect, because the hash value is cached and strings that get hashed once tend to get hashed again later.

Note that KeyError doesn't do any exception message formatting on creation. It only includes the bare key, which is pretty quick, especially if the key is already a string.

In comparison, instantiating an exception takes almost three times as long:

$ python3 -m timeit -s 'K=KeyError' 'K("test")'
10000000 loops, best of 3: 0.0779 usec per loop

We once had the case in Cython where dropping the instantiation of StopIteration at the end of generator execution gave a serious performance boost (more than 40% faster for short running generator expressions in the nqueens benchmark), but the same is less likely to apply to IndexError, which normally indicates a bug and not control flow. I lean towards agreeing with Terry that usability beats performance here.
msg222597 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-07-09 02:16
Looking at a single lookup performed over and over isn't going to get you a very good benchmark. If your keys are constantly reused, most of the losses won't show themselves. A more fair comparison I've used before is the difference between using the bytes object produced by bytes.maketrans as the mapping object for str.translate vs. using the dictionary produced by str.maketrans. That gets you the dynamically generated lookups that don't hit the dict optimizations for repeatedly looking up the same key, don't predictably access the same memory that never leaves the CPU cache, etc.

Check the timing data I submitted with #21118; the exact same translation applied to the same input strings, with the only difference being whether the table is bytes or dict, takes nearly twice as long using a dict as it does using a bytes object. And the bytes object isn't actually being used efficiently here; str.translate isn't optimized for the buffer protocol or anything, so it's constantly retrieving the cached small ints; a tuple might be even faster by avoiding that minor additional cost.
msg222600 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-07-09 06:52
Thanks for the information about timing, Stefan and Josh. That is good to know regardless of this ticket :)
History
Date User Action Args
2014-07-09 06:52:50cool-RRsetmessages: + msg222600
2014-07-09 02:16:20josh.rsetmessages: + msg222597
2014-07-05 14:12:55scodersetnosy: + scoder
messages: + msg222364
2014-07-05 13:14:49ezio.melottisetstatus: open -> closed

superseder: Add index attribute to IndexError

nosy: + ezio.melotti
messages: + msg222352
resolution: duplicate
stage: resolved
2014-07-05 10:32:46cool-RRsetmessages: + msg222346
2014-07-05 06:41:12berker.peksagsetnosy: + berker.peksag
messages: + msg222338
2014-07-05 06:27:19serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg222337
2014-07-05 03:20:07eric.araujosetnosy: + eric.araujo
messages: + msg222334
2014-07-05 02:28:01terry.reedysetnosy: + terry.reedy
messages: + msg222333
2014-07-03 23:13:33cool-RRsetmessages: + msg222242
2014-07-03 23:09:40r.david.murraysetnosy: + r.david.murray
messages: + msg222240
2014-07-03 23:08:08cool-RRsetmessages: + msg222239
2014-07-03 22:58:27josh.rsetmessages: + msg222238
2014-07-03 22:55:38BreamoreBoysetmessages: + msg222237
2014-07-03 22:51:13cool-RRsetmessages: + msg222236
2014-07-03 22:47:29BreamoreBoysettype: behavior -> enhancement

messages: + msg222235
nosy: + BreamoreBoy
2014-07-03 22:30:47cool-RRsetmessages: + msg222232
2014-07-03 22:19:44josh.rsetnosy: + josh.r
messages: + msg222230
2014-07-03 21:39:19cool-RRsetmessages: + msg222222
2014-07-03 21:21:53rhettingersetpriority: normal -> low

nosy: + rhettinger
messages: + msg222216

assignee: rhettinger
2014-07-03 12:33:14cool-RRcreate