Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should str.format allow negative indexes when used for __getitem__ access? #52199

Closed
ericvsmith opened this issue Feb 17, 2010 · 33 comments
Closed
Assignees
Labels
3.11 only security fixes docs Documentation in the Doc dir easy type-feature A feature request or enhancement

Comments

@ericvsmith
Copy link
Member

BPO 7951
Nosy @rhettinger, @terryjreedy, @mdickinson, @ericvsmith, @merwok, @florentx, @rovitotv, @marco-buttu, @ikamensh, @iritkatriel
Files
  • format_negative_indexes-2.7.diff: patch against trunk
  • format_negative_indexes-3.2.diff: patch against 3.2
  • format_no_fields_with_negative_indexes-2.7.diff: Don't allow negative fields
  • 7951NegativeIndexesForStringFormat3dot4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/terryjreedy'
    closed_at = <Date 2021-12-01.01:47:31.771>
    created_at = <Date 2010-02-17.23:54:17.722>
    labels = ['easy', '3.11', 'type-feature', 'docs']
    title = 'Should str.format allow negative indexes when used for __getitem__ access?'
    updated_at = <Date 2021-12-01.01:47:31.769>
    user = 'https://github.com/ericvsmith'

    bugs.python.org fields:

    activity = <Date 2021-12-01.01:47:31.769>
    actor = 'eric.smith'
    assignee = 'terry.reedy'
    closed = True
    closed_date = <Date 2021-12-01.01:47:31.771>
    closer = 'eric.smith'
    components = ['Documentation']
    creation = <Date 2010-02-17.23:54:17.722>
    creator = 'eric.smith'
    dependencies = []
    files = ['17712', '17713', '17768', '29948']
    hgrepos = []
    issue_num = 7951
    keywords = ['patch', 'easy']
    message_count = 33.0
    messages = ['99482', '99553', '107766', '107776', '107781', '107792', '107793', '107801', '107811', '107845', '108132', '108133', '108143', '108144', '108472', '108617', '113447', '113620', '113624', '115981', '116244', '187404', '190102', '215958', '216038', '225505', '266481', '340877', '340884', '347795', '407301', '407328', '407422']
    nosy_count = 14.0
    nosy_names = ['rhettinger', 'terry.reedy', 'mark.dickinson', 'eric.smith', 'kisielk', 'eric.araujo', 'mrabarnett', 'flox', 'docs@python', 'gosella', 'Todd.Rovito', 'marco.buttu', 'Ilya Kamenshchikov', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue7951'
    versions = ['Python 3.11']

    @ericvsmith
    Copy link
    Member Author

    It surprised me that this doesn't work:
    >>> "{0[-1]}".format('fox')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: string indices must be integers

    I was expecting it to be equivalent to:

    >>> "{0[2]}".format('fox')
    'x'

    I don't think there's any particular reason this doesn't work. It would, however break the following code:

    >>> "{0[-1]}".format({'-1':'foo'})
    'foo'

    But note that this doesn't work currently:

    >>> "{0[1]}".format({'1':'foo'})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 1

    @ericvsmith ericvsmith added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 17, 2010
    @ericvsmith ericvsmith self-assigned this Feb 17, 2010
    @ericvsmith ericvsmith added easy type-feature A feature request or enhancement labels Feb 17, 2010
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Feb 19, 2010

    On a related note, this doesn't work either:

    >>> "{-1}".format("x", "y", "z")
    Traceback (most recent call last):
      File "<pyshell#3>", line 1, in <module>
        "{-1}".format("x", "y", "z")
    KeyError: '-1'

    It could return "z".

    It also rejects a leading '+', but that would be optional anyway.

    @ericvsmith
    Copy link
    Member Author

    Closed bpo-8985 as a duplicate of this; merging nosy lists.

    @mdickinson
    Copy link
    Member

    I (reluctantly) agree it's surprising that "{0[-1]}".format(args) fails. And I suppose that if it were allowed then it would also make sense to consider "{-1}".format(*args) as well, in order to preserve the equivalence between "{n}".format(*args) and "{0[n]}".format(args). And then:

    >>> "{-0}".format(*['calvin'], **{'-0': 'hobbes'})
    'hobbes'

    would presumably produce 'calvin' instead of 'hobbes'...

    On '+': if "{0[-1]}" were allowed, I'm not sure whether the "+1" in "{0[+1]}".format(...) should also be interpreted as a list index. I don't really see the value of doing so apart from syntactic consistency: there are very few other places in Python that I'm aware of that accept -<one-or-more-digits> but not +<one-or-more-digits>.

    FWIW, my overall feeling is that the current rules are simple and adequate, and there's no great need to add this complication.

    I do wonder, though:

    How complicated would it be to make "{0[1]}".format({'1':'foo'}) a bit magical? That is, have the format method pass an integer to __getitem__ if the corresponding format argument is a sequence, and a string argument if it's a mapping (not sure what the criterion would be for distinguishing). Is this too much magic? Is it feasible implementation-wise?

    I don't think it's do-able for simple rather than compound field names: e.g., "{0}".format(args, **kwargs), since there we've got both a sequence *and a dict, so it's not clear whether to look at args[0] or kwargs['0']. (Unless either args or kwargs is empty, perhaps.) This is all getting a bit python-ideas'y, though.

    BTW, I notice that PEP-3101's "Simple field names are either names or numbers [...] if names, they must be valid Python identifiers" isn't actually true:

    >>> "{in-valid #identifier}".format(**{'in-valid #identifier': 42})
    '42'

    Though I don't have a problem with this; indeed, I think this is preferable to checking for a valid identifier.

    @ericvsmith
    Copy link
    Member Author

    Addressing just the last part of Mark's message right now:

    The PEP goes on to say:
    Implementation note: The implementation of this proposal is
    not required to enforce the rule about a simple or dotted name
    being a valid Python identifier. ...

    I rely on getattr lookup failing for dotted names, but for simple names there's no check at all. I agree it's desirable to leave this behavior.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Jun 14, 2010

    Re: msg107776.

    If it looks like an integer (ie, can be converted to an integer by 'int') then it's positional, otherwise it's a key. An optimisation is to perform a quick check upfront to see whether it starts like an integer.

    @mdickinson
    Copy link
    Member

    Matthew:

    would that include allowing whitespace, then?

    >>> int('\t\n+56')
    56

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Jun 14, 2010

    That's a good question. :-)

    Possibly just an optional sign followed by one or more digits.

    Another possibility that occurs to me is for it to default to positional if it looks like an integer, but allow quoting to force it to be a key:

    >>> "{0}".format("foo", **{"0": "bar"})
    'foo'
    >>> "{'0'}".format("foo", **{"0": "bar"})
    'bar'

    Or is that taking it too far?

    @ericvsmith
    Copy link
    Member Author

    I can see the point of allowing negative indices for a consistency point, but is there really any practical problem that's currently causing people hardship that this would solve?

    As for the rest of it, I think it's just not worth the additional burden on CPython and other implementations.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Jun 15, 2010

    Your original:

    "{0[-1]}".format('fox')
    

    is a worse gotcha than:

    "{-1}".format('fox')
    

    because you're much less likely to want to do the latter.

    It's one of those things that it would be nice to have fixed, or we could just add a warning to the documentation that it _might_ be fixed in the future, so people shouldn't rely on the current behaviour. :-)

    @gosella
    Copy link
    Mannequin

    gosella mannequin commented Jun 18, 2010

    I finally managed to get the time to finish the patch that allows negative indexes inside square brackets so now they work with the same semantics as in a python expression:

    >>> '{0[-1]}'.format(['abc', 'def'])
    'def'
    >>> '{0[-2]}'.format(['abc', 'def'])
    'abc'
    >>> '{0[-1][-1]}'.format(['abc', ['def']])
    'def'
    
    They work auto-numbered fields too:
    >>> '{[-1]}'.format(['abc', 'def'])
    'def'

    Also, a positive sign is now accepted as part of a valid integer:

    >>> '{0[+1]}'.format(['abc', 'def'])
    'def'

    As a bonus, negatives indexes are also allowed to refer to positional arguments:

    >>> '{-1}'.format('abc', 'def')
    'def'
    >>> '{-2}'.format('abc', 'def')
    'abc'

    I'm attaching a patch against trunk. I added some tests for this functionality in test_str.py.

    By the way, this code doesn't work anymore:

    >>> "{[-1]}".format({'-1': 'X'})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: -1L
    
    But now it behaves in the same way as:
    >>> "{[1]}".format({'1': 'X'})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 1L

    I didn't attempt to ignore whitespaces when trying to parse the index as an integer (to allow that "{ 0 }" can be treated as "{0}" and "{0[1]}" as "{ 0 [ 1 ] }") because I'm not sure if this behavior is desirable.

    @gosella
    Copy link
    Mannequin

    gosella mannequin commented Jun 18, 2010

    I forgot to mention that I also made a patch against py3k (was the same code).

    @rhettinger
    Copy link
    Contributor

    Perhaps this ought to be discussed on python-ideas or python-dev for a bit. It is not entirely clear that this is a GoodThingToDo(tm) nor is it clear that we want other Python implementations to have to invest the same effort.

    The spirit of the language freeze suggests that we shouldn't add this unless we really need it. The goal was to let other implementations catch up, not to add to their list of incompatabilites.

    @ericvsmith
    Copy link
    Member Author

    I agree with Raymond. I'm not convinced it allows you to write any code that you can't currently write, and I'm fairly sure it violates the moratorium. Implementing this would clearly put a burden on other implementations.

    Marking as "after moratorium".

    @kisielk
    Copy link
    Mannequin

    kisielk mannequin commented Jun 23, 2010

    While I agree this functionality isn't strictly necessary I think it makes sense from a semantic point of view. I ran in to this issue today while writing some code and I simply expected the negative syntax to work, given that the format string syntax otherwise very closely resembles standard array and attribute access.

    It would be nice to see this make it in eventually for consistency's sake.

    @gosella
    Copy link
    Mannequin

    gosella mannequin commented Jun 25, 2010

    Well, using negative indexes for fields can be thought as a new feature with all the consequences mentioned before BUT negative indexes for accessing elements from a sequence, IMHO, is something that anyone would expected to work. That's why at first I thought it was a bug and I fill an issue about it.

    The code that parses the fields and the indexes is the same, so when I change it to accept negative indexes, it worked for both cases. I'm attaching a patch that checks if a negative index is used in a field and reverts to the old behavior in that case, allowing only negative indexes for accessing sequences ( "{-1}" will raise KeyError because it will be threated as '-1').

    Perhaps in this way this issue could be partially fixed.

    @terryjreedy
    Copy link
    Member

    I believe this is covered by the PEP-3003 3.2 change moratorium.

    @rhettinger
    Copy link
    Contributor

    Fixing-up str formatting idiosyncracies does not fall under the moratorium and is helpful in getting 3.x to be usable.

    That being said, I'm not convinced that this is actually a helpful feature. Not all objects supporting __getitem__ offer support for negative indexing. Also, there's a case to be made that using negative indices in a formatting string is an anti-pattern, causing more harm than good.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Aug 11, 2010

    I agree with Kamil and Germán. I would've expected negative indexes for sequences to work. Negative indexes for fields is a different matter.

    @rhettinger
    Copy link
    Contributor

    After more thought, I'm -1 on this. "Consistency" is a weak argument in favor of this. We need to be more use case drivenm and it there is no evidence that this is needed. Also, there is a reasonable concern that using negative indices in a format string would be a bad design pattern that should not be encouraged by the language. And, there is a maintenance burden (just getting it right in the first place; having other implementations try to get it right; and a burden to custom formatters to have to support negative indices).

    I do think we have a documentation issue. This thread shows a number of experienced Python programmers who get "surprised" or perceive "consistency issues" perhaps because there isn't a clear mental picture of Python's layer structure (separation of concerns) and where the responsibility lies for the supporting negative indices.

    For the record, here are a few notes on where negative index handling fits into the hierarchy:

    Negative index support is not guaranteed by the collections.Sequence ABC nor by the grammar (see the "subscript" rule in Grammar/Grammar). It does not appear in opcode handling (see BINARY_SUBSCR in Python/ceval.c) nor in the top abstract layer (see PyObject_GetItem() in abstract.c). Instead, the support for slicing and negative index handling appears at the concrete layer (see listindex() in Objects/listobject.c for example).

    We do guarantee negative index handling for builtin sequences and their subclasses (as long as they don't override __getitem__), and we provide a fast path for their execution (via an intermediate abstract layer function, PySequence_GetItem() in Objects/abstract.c), but other sequence-like objects are free to make their own decisions about slices and negative indices at the concrete layer.

    Knowing this, a person should not be "surprised" when one sequence has support for negative indices or slicing and another does not. The choice belongs to the implementer of the concrete class, not to the caller of "a[x]". There is no "consistency" issue here.

    IOW, we're not required to implement negative slice handling and are free to decide whether it is a good idea or not for the use-case of string formatting. There is some question about whether it is a bad practice for people to use negative indices for string formatting. If so, that would be a reason not to do it. And if available, it would only work for builtin sequences, but not sequence like items in general. There is also a concern about placing a burden on other implementations of Python (to match what we do in CPython) and on placing a burden on people writing their own custom formatters (to closely as possible mimic builtin formatters). If so, those would be reasons not to do it.

    my-two-cents,

    Raymond

    @merwok
    Copy link
    Member

    merwok commented Sep 12, 2010

    Thank you for the detailed argument, Raymond. I’m +1 on turning this into a doc bug.

    @rhettinger rhettinger added docs Documentation in the Doc dir and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 12, 2010
    @rovitotv
    Copy link
    Mannequin

    rovitotv mannequin commented Apr 20, 2013

    Here is a simple patch that simply explains negative indexes and negative slices are not supported for the string format documentation. Perhaps more documentation needs to be created else where to help explain why all collections do not need to support negative indexes and negative slices? If so please let me know and I will create it. But I think this patch at least clarifies for the use case of String format.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented May 26, 2013

    Todd's patch strikes me as fine. If something more detailed is needed I think it would be better to raise a separate issue.

    @terryjreedy
    Copy link
    Member

    Either leading sign, '+' or '-', cause string interpretation, so I think 'unsigned integer' should be the term in the doc.

    >>> '{0[-1]}'.format({'-1': 'neg int key'})
    'neg int key'
    >>> '{0[+1]}'.format({'+1': 'neg int key'})
    'neg int key'
    >>> '{0[+1]}'.format([1,2,3])
    Traceback (most recent call last):
      File "<pyshell#16>", line 1, in <module>
        '{0[+1]}'.format([1,2,3])
    TypeError: list indices must be integers, not str

    @terryjreedy
    Copy link
    Member

    The doc bug is that the grammar block uses 'integer' (linked to https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-integer) in
      arg_name          ::=  [identifier | integer]
      element_index     ::=  integer | index_string
    when it should use 'decimalinteger' or even more exactly 'digit+'. The int() builtin uses the same relaxed rule when no base is given.
    >>> 011
    SyntaxError: invalid token
    >>> int('011')
    11
    >>> '{[011]}'.format('abcdefghijlmn')
    'm'

    One possibity is to replace 'integer' in the grammar block with 'digit+' and perhaps leave the text alone. Another is to replace 'integer' with 'index_number', to go with 'index_string, and add the production "index_number ::= digit+". My though for the latter is that 'index_number' would connect better with 'number' as used in the text. A further option would be to actually replace 'number' in the text with 'index_number'.

    PS to Todd. As much as possible, doc content changes should be separated from re-formatting. I believe the first block of your patch is purely a re-format

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 18, 2014

    msg216038 suggests three options for the doc patch, does anybody have any preference or a better alternative?

    @marco-buttu
    Copy link
    Mannequin

    marco-buttu mannequin commented May 27, 2016

    The error message is misleading:

    >>> s = '{names[-1]} loves {0[1]}'
    >>> s.format(('C', 'Python'), names=('Dennis', 'Guido'))
    Traceback (most recent call last):
        ...
    TypeError: tuple indices must be integers or slices, not str

    @merwok
    Copy link
    Member

    merwok commented Apr 26, 2019

    A side question: where is it defined that in {thing[0]}, 0 will be parsed as an integer?

    The PEP shows {thing[name]} and mentions that this is not Python but a smaller mini-language, with name always a string, no quotes needed or permitted.

    @ericvsmith
    Copy link
    Member Author

    I'm not sure where (or if) it's defined in the Python docs, but in PEP-3101 it's in https://www.python.org/dev/peps/pep-3101/#simple-and-compound-field-names: "It should be noted that the use of 'getitem' within a format string is much more limited than its conventional usage. In the above example, the string 'name' really is the literal string 'name', not a variable named 'name'. The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.".

    @ikamensh
    Copy link
    Mannequin

    ikamensh mannequin commented Jul 13, 2019

    Py3.6+ f-strings support any indexing as they actually evaluate python expressions.

    >>> a = ['Java', 'Python']
    >>> var = f"Hello {a[-1]}"
    Hello Python

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11, and the error message is a little weirder now:

    >>> "{0[-1]}".format('fox')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: string indices must be integers, not 'str'

    @iritkatriel iritkatriel added the 3.11 only security fixes label Nov 29, 2021
    @rhettinger
    Copy link
    Contributor

    I recommend not adding support for negative indexing to format() for accessing positional arguments. There is almost no reason to do this because it almost always makes the format string less readable, because the number of arguments is always known in advance, and because the arguments are almost always used entirely rather than selectively.

    Negative index support isn't a feature of the language. Instead, it is a feature provided on a class by class basis, if it makes sense for that class and for its use cases.

    We are not obliged to provide negative index support in places where it doesn't make sense or where it makes code less readable. For example, the islice() function doesn't support negative indices because it doesn't make sense there. Likewise, the Sequence ABC doesn't require negative index support or slice support.

    @ericvsmith
    Copy link
    Member Author

    I'm closing this as "won't fix" for the negative indexing functionality. If someone wants to open an new documentation issue (and ideally provide a PR), that would be welcome.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes docs Documentation in the Doc dir easy type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants