classification
Title: Should str.format allow negative indexes when used for __getitem__ access?
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: Ilya Kamenshchikov, Todd.Rovito, docs@python, eric.araujo, eric.smith, flox, gosella, kisielk, marco.buttu, mark.dickinson, mrabarnett, rhettinger, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2010-02-17 23:54 by eric.smith, last changed 2019-07-13 10:17 by Ilya Kamenshchikov.

Files
File name Uploaded Description Edit
format_negative_indexes-2.7.diff gosella, 2010-06-18 19:50 patch against trunk
format_negative_indexes-3.2.diff gosella, 2010-06-18 19:52 patch against 3.2
format_no_fields_with_negative_indexes-2.7.diff gosella, 2010-06-25 18:48 Don't allow negative fields
7951NegativeIndexesForStringFormat3dot4.patch Todd.Rovito, 2013-04-20 03:06 review
Messages (30)
msg99482 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-02-17 23:54
It surprised me that this doesn't work:
>>> "{0[-1]}".format('fox')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

I was expecting it to be equivalent to:

>>> "{0[2]}".format('fox')
'x'

I don't think there's any particular reason this doesn't work. It would, however break the following code:

>>> "{0[-1]}".format({'-1':'foo'})
'foo'

But note that this doesn't work currently:

>>> "{0[1]}".format({'1':'foo'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1
msg99553 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-02-19 01:43
On a related note, this doesn't work either:

>>> "{-1}".format("x", "y", "z")
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    "{-1}".format("x", "y", "z")
KeyError: '-1'

It could return "z".

It also rejects a leading '+', but that would be optional anyway.
msg107766 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-06-13 23:49
Closed issue 8985 as a duplicate of this; merging nosy lists.
msg107776 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-14 10:57
I (reluctantly) agree it's surprising that "{0[-1]}".format(args) fails.  And I suppose that if it were allowed then it would also make sense to consider "{-1}".format(*args) as well, in order to preserve the equivalence between "{n}".format(*args) and "{0[n]}".format(args).  And then:

>>> "{-0}".format(*['calvin'], **{'-0': 'hobbes'})
'hobbes'

would presumably produce 'calvin' instead of 'hobbes'...

On '+': if "{0[-1]}" were allowed, I'm not sure whether the "+1" in "{0[+1]}".format(...) should also be interpreted as a list index.  I don't really see the value of doing so apart from syntactic consistency: there are very few other places in Python that I'm aware of that accept -<one-or-more-digits> but not +<one-or-more-digits>.

FWIW, my overall feeling is that the current rules are simple and adequate, and there's no great need to add this complication.

I do wonder, though:

How complicated would it be to make "{0[1]}".format({'1':'foo'}) a bit magical?  That is, have the format method pass an integer to __getitem__ if the corresponding format argument is a sequence, and a string argument if it's a mapping (not sure what the criterion would be for distinguishing).  Is this too much magic?  Is it feasible implementation-wise?

I don't think it's do-able for simple rather than compound field names: e.g.,  "{0}".format(*args, **kwargs), since there we've got both a sequence *and* a dict, so it's not clear whether to look at args[0] or kwargs['0']. (Unless either args or kwargs is empty, perhaps.)  This is all getting a bit python-ideas'y, though.

BTW, I notice that PEP 3101's "Simple field names are either names or numbers [...] if names, they must be valid Python identifiers" isn't actually true:

>>> "{in-valid #identifier}".format(**{'in-valid #identifier': 42})
'42'

Though I don't have a problem with this;  indeed, I think this is preferable to checking for a valid identifier.
msg107781 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-06-14 11:23
Addressing just the last part of Mark's message right now:

The PEP goes on to say:
    Implementation note: The implementation of this proposal is
    not required to enforce the rule about a simple or dotted name
    being a valid Python identifier.  ...

I rely on getattr lookup failing for dotted names, but for simple names there's no check at all. I agree it's desirable to leave this behavior.
msg107792 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-06-14 15:30
Re: msg107776.

If it looks like an integer (ie, can be converted to an integer by 'int') then it's positional, otherwise it's a key. An optimisation is to perform a quick check upfront to see whether it starts like an integer.
msg107793 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-14 15:32
Matthew:

would that include allowing whitespace, then?

>>> int('\t\n+56')
56
msg107801 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-06-14 17:02
That's a good question. :-)

Possibly just an optional sign followed by one or more digits.

Another possibility that occurs to me is for it to default to positional if it looks like an integer, but allow quoting to force it to be a key:

>>> "{0}".format("foo", **{"0": "bar"})
'foo'
>>> "{'0'}".format("foo", **{"0": "bar"})
'bar'

Or is that taking it too far?
msg107811 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-06-14 21:20
I can see the point of allowing negative indices for a consistency point, but is there really any practical problem that's currently causing people hardship that this would solve?

As for the rest of it, I think it's just not worth the additional burden on CPython and other implementations.
msg107845 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-06-15 01:25
Your original:

    "{0[-1]}".format('fox')

is a worse gotcha than:

    "{-1}".format('fox')

because you're much less likely to want to do the latter.

It's one of those things that it would be nice to have fixed, or we could just add a warning to the documentation that it _might_ be fixed in the future, so people shouldn't rely on the current behaviour. :-)
msg108132 - (view) Author: Germán L. Osella Massa (gosella) Date: 2010-06-18 19:50
I finally managed to get the time to finish the patch that allows negative indexes inside square brackets so now they work with the same semantics as in a python expression:

>>> '{0[-1]}'.format(['abc', 'def'])
'def'
>>> '{0[-2]}'.format(['abc', 'def'])
'abc'
>>> '{0[-1][-1]}'.format(['abc', ['def']])
'def'

They work auto-numbered fields too:
>>> '{[-1]}'.format(['abc', 'def'])
'def'

Also, a positive sign is now accepted as part of a valid integer:

>>> '{0[+1]}'.format(['abc', 'def'])
'def'

As a bonus, negatives indexes are also allowed to refer to positional arguments:

>>> '{-1}'.format('abc', 'def')
'def'
>>> '{-2}'.format('abc', 'def')
'abc'

I'm attaching a patch against trunk. I added some tests for this functionality in test_str.py.

By the way, this code doesn't work anymore:

>>> "{[-1]}".format({'-1': 'X'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: -1L

But now it behaves in the same way as:
>>> "{[1]}".format({'1': 'X'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1L

I didn't attempt to ignore whitespaces when trying to parse the index as an integer (to allow that "{ 0 }" can be treated as "{0}" and "{0[1]}" as "{ 0 [ 1 ] }") because I'm not sure if this behavior is desirable.
msg108133 - (view) Author: Germán L. Osella Massa (gosella) Date: 2010-06-18 19:55
I forgot to mention that I also made a patch against py3k (was the same code).
msg108143 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-06-18 21:38
Perhaps this ought to be discussed on python-ideas or python-dev for a bit.  It is not entirely clear that this is a GoodThingToDo(tm) nor is it clear that we want other Python implementations to have to invest the same effort.

The spirit of the language freeze suggests that we shouldn't add this unless we really need it.  The goal was to let other implementations catch up, not to add to their list of incompatabilites.
msg108144 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-06-18 21:47
I agree with Raymond. I'm not convinced it allows you to write any code that you can't currently write, and I'm fairly sure it violates the moratorium. Implementing this would clearly put a burden on other implementations.

Marking as "after moratorium".
msg108472 - (view) Author: Kamil Kisiel (kisielk) Date: 2010-06-23 18:40
While I agree this functionality isn't strictly necessary I think it makes sense from a semantic point of view. I ran in to this issue today while writing some code and I simply expected the negative syntax to work, given that the format string syntax otherwise very closely resembles standard array and attribute access.

It would be nice to see this make it in eventually for consistency's sake.
msg108617 - (view) Author: Germán L. Osella Massa (gosella) Date: 2010-06-25 18:48
Well, using negative indexes for fields can be thought as a new feature with all the consequences mentioned before BUT negative indexes for accessing elements from a sequence, IMHO, is something that anyone would expected to work. That's why at first I thought it was a bug and I fill an issue about it.

The code that parses the fields and the indexes is the same, so when I change it to accept negative indexes, it worked for both cases. I'm attaching a patch that checks if a negative index is used in a field and reverts to the old behavior in that case, allowing only negative indexes for accessing sequences ( "{-1}" will raise KeyError because it will be threated as '-1').

Perhaps in this way this issue could be partially fixed.
msg113447 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-09 18:42
I believe this is covered by the PEP3003 3.2 change moratorium.
msg113620 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-11 19:41
Fixing-up str formatting idiosyncracies does not fall under the moratorium and is helpful in getting 3.x to be usable.

That being said, I'm not convinced that this is actually a helpful feature.  Not all objects supporting __getitem__ offer support for negative indexing.  Also, there's a case to be made that using negative indices in a formatting string is an anti-pattern, causing more harm than good.
msg113624 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-08-11 20:01
I agree with Kamil and Germán. I would've expected negative indexes for sequences to work. Negative indexes for fields is a different matter.
msg115981 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-09 23:55
After more thought, I'm -1 on this.  "Consistency" is a weak argument in favor of this.  We need to be more use case drivenm and it there is no evidence that this is needed.  Also, there is a reasonable concern that using negative indices in a format string would be a bad design pattern that should not be encouraged by the language.  And, there is a maintenance burden (just getting it right in the first place; having other implementations try to get it right; and a burden to custom formatters to have to support negative indices).

I do think we have a documentation issue.  This thread shows a number of experienced Python programmers who get "surprised" or perceive "consistency issues" perhaps because there isn't a clear mental picture  of Python's layer structure (separation of concerns) and where the responsibility lies for the supporting negative indices.

For the record, here are a few notes on where negative index handling fits into the hierarchy:

Negative index support is not guaranteed by the collections.Sequence ABC nor by the grammar (see the "subscript" rule in Grammar/Grammar).  It does not appear in opcode handling (see BINARY_SUBSCR in Python/ceval.c) nor in the top abstract layer (see PyObject_GetItem() in abstract.c).  Instead, the support for slicing and negative index handling appears at the concrete layer (see listindex() in Objects/listobject.c for example).

We do guarantee negative index handling for builtin sequences and their subclasses (as long as they don't override __getitem__), and we provide a fast path for their execution (via an intermediate abstract layer function, PySequence_GetItem() in Objects/abstract.c), but other sequence-like objects are free to make their own decisions about slices and negative indices at the concrete layer.

Knowing this, a person should not be "surprised" when one sequence has support for negative indices or slicing and another does not.  The choice belongs to the implementer of the concrete class, not to the caller of "a[x]".  There is no "consistency" issue here.

IOW, we're not required to implement negative slice handling and are free to decide whether it is a good idea or not for the use-case of string formatting.  There is some question about whether it is a bad practice for people to use negative indices for string formatting.  If so, that would be a reason not to do it.  And if available, it would only work for builtin sequences, but not sequence like items in general.  There is also a concern about placing a burden on other implementations of Python (to match what we do in CPython) and on placing a burden on people writing their own custom formatters (to closely as possible mimic builtin formatters).  If so, those would be reasons not to do it.

my-two-cents,

Raymond
msg116244 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-12 23:02
Thank you for the detailed argument, Raymond.  I’m +1 on turning this into a doc bug.
msg187404 - (view) Author: Todd Rovito (Todd.Rovito) * Date: 2013-04-20 03:06
Here is a simple patch that simply explains negative indexes and negative slices are not supported for the string format documentation.  Perhaps more documentation needs to be created else where to help explain why all collections do not need to support negative indexes and negative slices? If so please let me know and I will create it.  But I think this patch at least clarifies for the use case of String format.
msg190102 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-26 17:34
Todd's patch strikes me as fine.  If something more detailed is needed I think it would be better to raise a separate issue.
msg215958 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-12 02:40
Either leading sign, '+' or '-', cause string interpretation, so I think 'unsigned integer' should be the term in the doc.

>>> '{0[-1]}'.format({'-1': 'neg int key'})
'neg int key'
>>> '{0[+1]}'.format({'+1': 'neg int key'})
'neg int key'
>>> '{0[+1]}'.format([1,2,3])
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    '{0[+1]}'.format([1,2,3])
TypeError: list indices must be integers, not str
msg216038 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-13 23:43
The doc bug is that the grammar block uses 'integer' (linked to https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-integer) in
  arg_name          ::=  [identifier | integer]
  element_index     ::=  integer | index_string
when it should use 'decimalinteger' or even more exactly 'digit+'. The int() builtin uses the same relaxed rule when no base is given.
>>> 011
SyntaxError: invalid token
>>> int('011')
11
>>> '{[011]}'.format('abcdefghijlmn')
'm'

One possibity is to replace 'integer' in the grammar block with 'digit+' and perhaps leave the text alone. Another is to replace 'integer' with 'index_number', to go with 'index_string, and add the production "index_number ::= digit+". My though for the latter is that 'index_number' would connect better with 'number' as used in the text. A further option would be to actually replace 'number' in the text with 'index_number'.


PS to Todd. As much as possible, doc content changes should be separated from re-formatting. I believe the first block of your patch is purely a re-format
msg225505 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-08-18 19:50
msg216038 suggests three options for the doc patch, does anybody have any preference or a better alternative?
msg266481 - (view) Author: Marco Buttu (marco.buttu) * Date: 2016-05-27 06:45
The error message is misleading:

>>> s = '{names[-1]} loves {0[1]}'
>>> s.format(('C', 'Python'), names=('Dennis', 'Guido'))
Traceback (most recent call last):
    ...
TypeError: tuple indices must be integers or slices, not str
msg340877 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2019-04-26 02:52
A side question: where is it defined that in `{thing[0]}`, 0 will be parsed as an integer?

The PEP shows `{thing[name]}` and mentions that this is not Python but a smaller mini-language, with `name` always a string, no quotes needed or permitted.
msg340884 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-04-26 06:56
I'm not sure where (or if) it's defined in the Python docs, but in PEP 3101 it's in https://www.python.org/dev/peps/pep-3101/#simple-and-compound-field-names: "It should be noted that the use of 'getitem' within a format string is much more limited than its conventional usage. In the above example, the string 'name' really is the literal string 'name', not a variable named 'name'. The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.".
msg347795 - (view) Author: Ilya Kamenshchikov (Ilya Kamenshchikov) * Date: 2019-07-13 10:17
Py3.6+ f-strings support any indexing as they actually evaluate python expressions. 

>>> a = ['Java', 'Python']
>>> var = f"Hello {a[-1]}"
Hello Python
History
Date User Action Args
2019-07-13 10:17:18Ilya Kamenshchikovsetnosy: + Ilya Kamenshchikov
messages: + msg347795
2019-04-26 06:56:03eric.smithsetmessages: + msg340884
2019-04-26 02:52:28eric.araujosetmessages: + msg340877
2018-03-27 21:43:24serhiy.storchakalinkissue33160 superseder
2016-05-28 21:14:38BreamoreBoysetnosy: - BreamoreBoy
2016-05-27 06:45:13marco.buttusetnosy: + marco.buttu
messages: + msg266481
2014-08-18 19:50:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg225505
2014-04-13 23:43:30terry.reedysetmessages: + msg216038
2014-04-13 20:51:45terry.reedysetassignee: docs@python -> terry.reedy
2014-04-12 14:06:01eric.smithsetassignee: eric.smith -> docs@python

nosy: + docs@python
2014-04-12 02:40:49terry.reedysetmessages: + msg215958
2014-02-03 17:11:11BreamoreBoysetnosy: - BreamoreBoy
2013-05-26 17:34:50BreamoreBoysetnosy: + BreamoreBoy
messages: + msg190102
2013-04-20 03:06:41Todd.Rovitosetfiles: + 7951NegativeIndexesForStringFormat3dot4.patch
keywords: + patch
messages: + msg187404

versions: + Python 3.4, - Python 3.2
2013-04-19 16:47:09Todd.Rovitosetnosy: + Todd.Rovito
2010-09-12 23:53:16rhettingersetnosy: rhettinger, terry.reedy, mark.dickinson, eric.smith, kisielk, eric.araujo, mrabarnett, flox, gosella
components: + Documentation, - Interpreter Core
2010-09-12 23:02:54eric.araujosetmessages: + msg116244
2010-09-09 23:55:05rhettingersetmessages: + msg115981
2010-09-09 19:30:26floxsetnosy: + flox
2010-08-11 20:01:35mrabarnettsetmessages: + msg113624
2010-08-11 19:41:20rhettingersetkeywords: - patch, after moratorium

messages: + msg113620
versions: + Python 3.2, - Python 3.3
2010-08-09 18:42:49terry.reedysetnosy: + terry.reedy

messages: + msg113447
versions: + Python 3.3, - Python 3.2
2010-06-25 18:48:22gosellasetfiles: + format_no_fields_with_negative_indexes-2.7.diff
keywords: + patch
messages: + msg108617
2010-06-23 18:40:52kisielksetnosy: + kisielk
messages: + msg108472
2010-06-18 21:47:18eric.smithsetkeywords: + after moratorium, - patch

messages: + msg108144
2010-06-18 21:38:22rhettingersetnosy: + rhettinger
messages: + msg108143
2010-06-18 19:55:15gosellasetmessages: + msg108133
2010-06-18 19:52:53gosellasetfiles: + format_negative_indexes-3.2.diff
2010-06-18 19:50:39gosellasetfiles: + format_negative_indexes-2.7.diff
keywords: + patch
messages: + msg108132
2010-06-15 01:25:44mrabarnettsetmessages: + msg107845
2010-06-14 21:20:28eric.smithsetmessages: + msg107811
2010-06-14 17:02:58mrabarnettsetmessages: + msg107801
2010-06-14 15:32:48mark.dickinsonsetmessages: + msg107793
2010-06-14 15:30:59mrabarnettsetmessages: + msg107792
2010-06-14 11:23:34eric.smithsetmessages: + msg107781
2010-06-14 10:57:40mark.dickinsonsetmessages: + msg107776
2010-06-13 23:50:42eric.smithsetstage: needs patch
2010-06-13 23:49:48eric.smithsetnosy: + mark.dickinson, eric.araujo, gosella
messages: + msg107766
2010-06-13 23:48:57eric.smithlinkissue8985 superseder
2010-06-13 23:47:16eric.smithsetversions: - Python 2.7
2010-02-19 01:43:30mrabarnettsetnosy: + mrabarnett
messages: + msg99553
2010-02-17 23:54:17eric.smithcreate