Should str.format allow negative indexes when used for getitem access? #52199

ericvsmith · 2010-02-17T23:54:18Z

BPO	7951
Nosy	@rhettinger, @terryjreedy, @mdickinson, @ericvsmith, @merwok, @florentx, @rovitotv, @marco-buttu, @ikamensh, @iritkatriel
Files	format_negative_indexes-2.7.diff: patch against trunk format_negative_indexes-3.2.diff: patch against 3.2 format_no_fields_with_negative_indexes-2.7.diff: Don't allow negative fields 7951NegativeIndexesForStringFormat3dot4.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/terryjreedy'
closed_at = <Date 2021-12-01.01:47:31.771>
created_at = <Date 2010-02-17.23:54:17.722>
labels = ['easy', '3.11', 'type-feature', 'docs']
title = 'Should str.format allow negative indexes when used for __getitem__ access?'
updated_at = <Date 2021-12-01.01:47:31.769>
user = 'https://github.com/ericvsmith'

bugs.python.org fields:

activity = <Date 2021-12-01.01:47:31.769>
actor = 'eric.smith'
assignee = 'terry.reedy'
closed = True
closed_date = <Date 2021-12-01.01:47:31.771>
closer = 'eric.smith'
components = ['Documentation']
creation = <Date 2010-02-17.23:54:17.722>
creator = 'eric.smith'
dependencies = []
files = ['17712', '17713', '17768', '29948']
hgrepos = []
issue_num = 7951
keywords = ['patch', 'easy']
message_count = 33.0
messages = ['99482', '99553', '107766', '107776', '107781', '107792', '107793', '107801', '107811', '107845', '108132', '108133', '108143', '108144', '108472', '108617', '113447', '113620', '113624', '115981', '116244', '187404', '190102', '215958', '216038', '225505', '266481', '340877', '340884', '347795', '407301', '407328', '407422']
nosy_count = 14.0
nosy_names = ['rhettinger', 'terry.reedy', 'mark.dickinson', 'eric.smith', 'kisielk', 'eric.araujo', 'mrabarnett', 'flox', 'docs@python', 'gosella', 'Todd.Rovito', 'marco.buttu', 'Ilya Kamenshchikov', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue7951'
versions = ['Python 3.11']

ericvsmith · 2010-02-17T23:54:17Z

It surprised me that this doesn't work:
>>> "{0[-1]}".format('fox')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

I was expecting it to be equivalent to:

>>> "{0[2]}".format('fox')
'x'

I don't think there's any particular reason this doesn't work. It would, however break the following code:

>>> "{0[-1]}".format({'-1':'foo'})
'foo'

But note that this doesn't work currently:

>>> "{0[1]}".format({'1':'foo'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1

mrabarnett · 2010-02-19T01:43:30Z

On a related note, this doesn't work either:

>>> "{-1}".format("x", "y", "z")
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    "{-1}".format("x", "y", "z")
KeyError: '-1'

It could return "z".

It also rejects a leading '+', but that would be optional anyway.

ericvsmith · 2010-06-13T23:49:48Z

Closed bpo-8985 as a duplicate of this; merging nosy lists.

mdickinson · 2010-06-14T10:57:38Z

I (reluctantly) agree it's surprising that "{0[-1]}".format(args) fails. And I suppose that if it were allowed then it would also make sense to consider "{-1}".format(*args) as well, in order to preserve the equivalence between "{n}".format(*args) and "{0[n]}".format(args). And then:

>>> "{-0}".format(*['calvin'], **{'-0': 'hobbes'})
'hobbes'

would presumably produce 'calvin' instead of 'hobbes'...

On '+': if "{0[-1]}" were allowed, I'm not sure whether the "+1" in "{0[+1]}".format(...) should also be interpreted as a list index. I don't really see the value of doing so apart from syntactic consistency: there are very few other places in Python that I'm aware of that accept -<one-or-more-digits> but not +<one-or-more-digits>.

FWIW, my overall feeling is that the current rules are simple and adequate, and there's no great need to add this complication.

I do wonder, though:

How complicated would it be to make "{0[1]}".format({'1':'foo'}) a bit magical? That is, have the format method pass an integer to __getitem__ if the corresponding format argument is a sequence, and a string argument if it's a mapping (not sure what the criterion would be for distinguishing). Is this too much magic? Is it feasible implementation-wise?

I don't think it's do-able for simple rather than compound field names: e.g., "{0}".format(args, **kwargs), since there we've got both a sequence *and a dict, so it's not clear whether to look at args[0] or kwargs['0']. (Unless either args or kwargs is empty, perhaps.) This is all getting a bit python-ideas'y, though.

BTW, I notice that PEP-3101's "Simple field names are either names or numbers [...] if names, they must be valid Python identifiers" isn't actually true:

>>> "{in-valid #identifier}".format(**{'in-valid #identifier': 42})
'42'

Though I don't have a problem with this; indeed, I think this is preferable to checking for a valid identifier.

ericvsmith · 2010-06-14T11:23:34Z

Addressing just the last part of Mark's message right now:

The PEP goes on to say:
Implementation note: The implementation of this proposal is
not required to enforce the rule about a simple or dotted name
being a valid Python identifier. ...

I rely on getattr lookup failing for dotted names, but for simple names there's no check at all. I agree it's desirable to leave this behavior.

mrabarnett · 2010-06-14T15:30:59Z

Re: msg107776.

If it looks like an integer (ie, can be converted to an integer by 'int') then it's positional, otherwise it's a key. An optimisation is to perform a quick check upfront to see whether it starts like an integer.

mdickinson · 2010-06-14T15:32:49Z

Matthew:

would that include allowing whitespace, then?

>>> int('\t\n+56')
56

mrabarnett · 2010-06-14T17:02:58Z

That's a good question. :-)

Possibly just an optional sign followed by one or more digits.

Another possibility that occurs to me is for it to default to positional if it looks like an integer, but allow quoting to force it to be a key:

>>> "{0}".format("foo", **{"0": "bar"})
'foo'
>>> "{'0'}".format("foo", **{"0": "bar"})
'bar'

Or is that taking it too far?

ericvsmith · 2010-06-14T21:20:28Z

I can see the point of allowing negative indices for a consistency point, but is there really any practical problem that's currently causing people hardship that this would solve?

As for the rest of it, I think it's just not worth the additional burden on CPython and other implementations.

mrabarnett · 2010-06-15T01:25:43Z

Your original:

"{0[-1]}".format('fox')

is a worse gotcha than:

"{-1}".format('fox')

because you're much less likely to want to do the latter.

It's one of those things that it would be nice to have fixed, or we could just add a warning to the documentation that it _might_ be fixed in the future, so people shouldn't rely on the current behaviour. :-)

gosella · 2010-06-18T19:50:38Z

I finally managed to get the time to finish the patch that allows negative indexes inside square brackets so now they work with the same semantics as in a python expression:

>>> '{0[-1]}'.format(['abc', 'def'])
'def'
>>> '{0[-2]}'.format(['abc', 'def'])
'abc'
>>> '{0[-1][-1]}'.format(['abc', ['def']])
'def'

They work auto-numbered fields too:
>>> '{[-1]}'.format(['abc', 'def'])
'def'

Also, a positive sign is now accepted as part of a valid integer:

>>> '{0[+1]}'.format(['abc', 'def'])
'def'

As a bonus, negatives indexes are also allowed to refer to positional arguments:

>>> '{-1}'.format('abc', 'def')
'def'
>>> '{-2}'.format('abc', 'def')
'abc'

I'm attaching a patch against trunk. I added some tests for this functionality in test_str.py.

By the way, this code doesn't work anymore:

>>> "{[-1]}".format({'-1': 'X'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: -1L

But now it behaves in the same way as:
>>> "{[1]}".format({'1': 'X'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1L

I didn't attempt to ignore whitespaces when trying to parse the index as an integer (to allow that "{ 0 }" can be treated as "{0}" and "{0[1]}" as "{ 0 [ 1 ] }") because I'm not sure if this behavior is desirable.

gosella · 2010-06-18T19:55:15Z

I forgot to mention that I also made a patch against py3k (was the same code).

rhettinger · 2010-06-18T21:38:22Z

Perhaps this ought to be discussed on python-ideas or python-dev for a bit. It is not entirely clear that this is a GoodThingToDo(tm) nor is it clear that we want other Python implementations to have to invest the same effort.

The spirit of the language freeze suggests that we shouldn't add this unless we really need it. The goal was to let other implementations catch up, not to add to their list of incompatabilites.

ericvsmith · 2010-06-18T21:47:18Z

I agree with Raymond. I'm not convinced it allows you to write any code that you can't currently write, and I'm fairly sure it violates the moratorium. Implementing this would clearly put a burden on other implementations.

Marking as "after moratorium".

kisielk · 2010-06-23T18:40:53Z

While I agree this functionality isn't strictly necessary I think it makes sense from a semantic point of view. I ran in to this issue today while writing some code and I simply expected the negative syntax to work, given that the format string syntax otherwise very closely resembles standard array and attribute access.

It would be nice to see this make it in eventually for consistency's sake.

gosella · 2010-06-25T18:48:21Z

Well, using negative indexes for fields can be thought as a new feature with all the consequences mentioned before BUT negative indexes for accessing elements from a sequence, IMHO, is something that anyone would expected to work. That's why at first I thought it was a bug and I fill an issue about it.

The code that parses the fields and the indexes is the same, so when I change it to accept negative indexes, it worked for both cases. I'm attaching a patch that checks if a negative index is used in a field and reverts to the old behavior in that case, allowing only negative indexes for accessing sequences ( "{-1}" will raise KeyError because it will be threated as '-1').

Perhaps in this way this issue could be partially fixed.

terryjreedy · 2010-08-09T18:42:50Z

I believe this is covered by the PEP-3003 3.2 change moratorium.

rhettinger · 2010-08-11T19:41:20Z

Fixing-up str formatting idiosyncracies does not fall under the moratorium and is helpful in getting 3.x to be usable.

That being said, I'm not convinced that this is actually a helpful feature. Not all objects supporting __getitem__ offer support for negative indexing. Also, there's a case to be made that using negative indices in a formatting string is an anti-pattern, causing more harm than good.

mrabarnett · 2010-08-11T20:01:35Z

I agree with Kamil and Germán. I would've expected negative indexes for sequences to work. Negative indexes for fields is a different matter.

rhettinger · 2010-09-09T23:55:04Z

After more thought, I'm -1 on this. "Consistency" is a weak argument in favor of this. We need to be more use case drivenm and it there is no evidence that this is needed. Also, there is a reasonable concern that using negative indices in a format string would be a bad design pattern that should not be encouraged by the language. And, there is a maintenance burden (just getting it right in the first place; having other implementations try to get it right; and a burden to custom formatters to have to support negative indices).

I do think we have a documentation issue. This thread shows a number of experienced Python programmers who get "surprised" or perceive "consistency issues" perhaps because there isn't a clear mental picture of Python's layer structure (separation of concerns) and where the responsibility lies for the supporting negative indices.

For the record, here are a few notes on where negative index handling fits into the hierarchy:

Negative index support is not guaranteed by the collections.Sequence ABC nor by the grammar (see the "subscript" rule in Grammar/Grammar). It does not appear in opcode handling (see BINARY_SUBSCR in Python/ceval.c) nor in the top abstract layer (see PyObject_GetItem() in abstract.c). Instead, the support for slicing and negative index handling appears at the concrete layer (see listindex() in Objects/listobject.c for example).

We do guarantee negative index handling for builtin sequences and their subclasses (as long as they don't override __getitem__), and we provide a fast path for their execution (via an intermediate abstract layer function, PySequence_GetItem() in Objects/abstract.c), but other sequence-like objects are free to make their own decisions about slices and negative indices at the concrete layer.

Knowing this, a person should not be "surprised" when one sequence has support for negative indices or slicing and another does not. The choice belongs to the implementer of the concrete class, not to the caller of "a[x]". There is no "consistency" issue here.

IOW, we're not required to implement negative slice handling and are free to decide whether it is a good idea or not for the use-case of string formatting. There is some question about whether it is a bad practice for people to use negative indices for string formatting. If so, that would be a reason not to do it. And if available, it would only work for builtin sequences, but not sequence like items in general. There is also a concern about placing a burden on other implementations of Python (to match what we do in CPython) and on placing a burden on people writing their own custom formatters (to closely as possible mimic builtin formatters). If so, those would be reasons not to do it.

my-two-cents,

Raymond

merwok · 2010-09-12T23:02:55Z

Thank you for the detailed argument, Raymond. I’m +1 on turning this into a doc bug.

rovitotv · 2013-04-20T03:06:40Z

Here is a simple patch that simply explains negative indexes and negative slices are not supported for the string format documentation. Perhaps more documentation needs to be created else where to help explain why all collections do not need to support negative indexes and negative slices? If so please let me know and I will create it. But I think this patch at least clarifies for the use case of String format.

BreamoreBoy · 2013-05-26T17:34:51Z

Todd's patch strikes me as fine. If something more detailed is needed I think it would be better to raise a separate issue.

terryjreedy · 2014-04-12T02:40:49Z

Either leading sign, '+' or '-', cause string interpretation, so I think 'unsigned integer' should be the term in the doc.

>>> '{0[-1]}'.format({'-1': 'neg int key'})
'neg int key'
>>> '{0[+1]}'.format({'+1': 'neg int key'})
'neg int key'
>>> '{0[+1]}'.format([1,2,3])
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    '{0[+1]}'.format([1,2,3])
TypeError: list indices must be integers, not str

terryjreedy · 2014-04-13T23:43:30Z

The doc bug is that the grammar block uses 'integer' (linked to https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-integer) in
  arg_name          ::=  [identifier | integer]
  element_index     ::=  integer | index_string
when it should use 'decimalinteger' or even more exactly 'digit+'. The int() builtin uses the same relaxed rule when no base is given.
>>> 011
SyntaxError: invalid token
>>> int('011')
11
>>> '{[011]}'.format('abcdefghijlmn')
'm'

One possibity is to replace 'integer' in the grammar block with 'digit+' and perhaps leave the text alone. Another is to replace 'integer' with 'index_number', to go with 'index_string, and add the production "index_number ::= digit+". My though for the latter is that 'index_number' would connect better with 'number' as used in the text. A further option would be to actually replace 'number' in the text with 'index_number'.

PS to Todd. As much as possible, doc content changes should be separated from re-formatting. I believe the first block of your patch is purely a re-format

BreamoreBoy · 2014-08-18T19:50:12Z

msg216038 suggests three options for the doc patch, does anybody have any preference or a better alternative?

marco-buttu · 2016-05-27T06:45:13Z

The error message is misleading:

>>> s = '{names[-1]} loves {0[1]}'
>>> s.format(('C', 'Python'), names=('Dennis', 'Guido'))
Traceback (most recent call last):
    ...
TypeError: tuple indices must be integers or slices, not str

merwok · 2019-04-26T02:52:28Z

A side question: where is it defined that in {thing[0]}, 0 will be parsed as an integer?

The PEP shows {thing[name]} and mentions that this is not Python but a smaller mini-language, with name always a string, no quotes needed or permitted.

ericvsmith · 2019-04-26T06:56:04Z

I'm not sure where (or if) it's defined in the Python docs, but in PEP-3101 it's in https://www.python.org/dev/peps/pep-3101/#simple-and-compound-field-names: "It should be noted that the use of 'getitem' within a format string is much more limited than its conventional usage. In the above example, the string 'name' really is the literal string 'name', not a variable named 'name'. The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.".

ikamensh · 2019-07-13T10:17:19Z

Py3.6+ f-strings support any indexing as they actually evaluate python expressions.

>>> a = ['Java', 'Python']
>>> var = f"Hello {a[-1]}"
Hello Python

iritkatriel · 2021-11-29T17:18:19Z

Reproduced on 3.11, and the error message is a little weirder now:

>>> "{0[-1]}".format('fox')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers, not 'str'

rhettinger · 2021-11-29T22:01:36Z

I recommend not adding support for negative indexing to format() for accessing positional arguments. There is almost no reason to do this because it almost always makes the format string less readable, because the number of arguments is always known in advance, and because the arguments are almost always used entirely rather than selectively.

Negative index support isn't a feature of the language. Instead, it is a feature provided on a class by class basis, if it makes sense for that class and for its use cases.

We are not obliged to provide negative index support in places where it doesn't make sense or where it makes code less readable. For example, the islice() function doesn't support negative indices because it doesn't make sense there. Likewise, the Sequence ABC doesn't require negative index support or slice support.

ericvsmith · 2021-12-01T01:47:32Z

I'm closing this as "won't fix" for the negative indexing functionality. If someone wants to open an new documentation issue (and ideally provide a PR), that would be welcome.

ericvsmith added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 17, 2010

ericvsmith self-assigned this Feb 17, 2010

ericvsmith added easy type-feature A feature request or enhancement labels Feb 17, 2010

rhettinger added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 12, 2010

ericvsmith assigned docspython and unassigned ericvsmith Apr 12, 2014

terryjreedy assigned terryjreedy and unassigned docspython Apr 13, 2014

iritkatriel added the 3.11 only security fixes label Nov 29, 2021

ericvsmith closed this as completed Dec 1, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should str.format allow negative indexes when used for getitem access? #52199

Should str.format allow negative indexes when used for getitem access? #52199

ericvsmith commented Feb 17, 2010

ericvsmith commented Feb 17, 2010

mrabarnett mannequin commented Feb 19, 2010

ericvsmith commented Jun 13, 2010

mdickinson commented Jun 14, 2010

ericvsmith commented Jun 14, 2010

mrabarnett mannequin commented Jun 14, 2010

mdickinson commented Jun 14, 2010

mrabarnett mannequin commented Jun 14, 2010

ericvsmith commented Jun 14, 2010

mrabarnett mannequin commented Jun 15, 2010

gosella mannequin commented Jun 18, 2010

gosella mannequin commented Jun 18, 2010

rhettinger commented Jun 18, 2010

ericvsmith commented Jun 18, 2010

kisielk mannequin commented Jun 23, 2010

gosella mannequin commented Jun 25, 2010

terryjreedy commented Aug 9, 2010

rhettinger commented Aug 11, 2010

mrabarnett mannequin commented Aug 11, 2010

rhettinger commented Sep 9, 2010

merwok commented Sep 12, 2010

rovitotv mannequin commented Apr 20, 2013

BreamoreBoy mannequin commented May 26, 2013

terryjreedy commented Apr 12, 2014

terryjreedy commented Apr 13, 2014

BreamoreBoy mannequin commented Aug 18, 2014

marco-buttu mannequin commented May 27, 2016

merwok commented Apr 26, 2019

ericvsmith commented Apr 26, 2019

ikamensh mannequin commented Jul 13, 2019

iritkatriel commented Nov 29, 2021

rhettinger commented Nov 29, 2021

ericvsmith commented Dec 1, 2021

Should str.format allow negative indexes when used for __getitem__ access? #52199

Should str.format allow negative indexes when used for __getitem__ access? #52199

Comments

ericvsmith commented Feb 17, 2010

ericvsmith commented Feb 17, 2010

mrabarnett mannequin commented Feb 19, 2010

ericvsmith commented Jun 13, 2010

mdickinson commented Jun 14, 2010

ericvsmith commented Jun 14, 2010

mrabarnett mannequin commented Jun 14, 2010

mdickinson commented Jun 14, 2010

mrabarnett mannequin commented Jun 14, 2010

ericvsmith commented Jun 14, 2010

mrabarnett mannequin commented Jun 15, 2010

gosella mannequin commented Jun 18, 2010

gosella mannequin commented Jun 18, 2010

rhettinger commented Jun 18, 2010

ericvsmith commented Jun 18, 2010

kisielk mannequin commented Jun 23, 2010

gosella mannequin commented Jun 25, 2010

terryjreedy commented Aug 9, 2010

rhettinger commented Aug 11, 2010

mrabarnett mannequin commented Aug 11, 2010

rhettinger commented Sep 9, 2010

merwok commented Sep 12, 2010

rovitotv mannequin commented Apr 20, 2013

BreamoreBoy mannequin commented May 26, 2013

terryjreedy commented Apr 12, 2014

terryjreedy commented Apr 13, 2014

BreamoreBoy mannequin commented Aug 18, 2014

marco-buttu mannequin commented May 27, 2016

merwok commented Apr 26, 2019

ericvsmith commented Apr 26, 2019

ikamensh mannequin commented Jul 13, 2019

iritkatriel commented Nov 29, 2021

rhettinger commented Nov 29, 2021

ericvsmith commented Dec 1, 2021

Should str.format allow negative indexes when used for getitem access? #52199

Should str.format allow negative indexes when used for getitem access? #52199