classification
Title: Format parser is too permissive
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: belopolsky, benjamin.peterson, docs@python, eric.araujo, eric.smith, mark.dickinson, terry.reedy
Priority: normal Keywords:

Created on 2010-10-04 16:24 by belopolsky, last changed 2015-10-02 21:38 by belopolsky.

Messages (11)
msg117961 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 16:24
According to the Format String Syntax section [1], attribute_name must be an identifier.  However, the parser does not catch a violation of this rule and happily passes non-indentifier strings to getattribute:

>>> class X:
...    def __getattribute__(self, a): return 'foo'
... 
>>> '{.$#@}'.format(X())
'foo'
 
If this is a desirable feature, I think it should be clearly documented because in some cases, for example when formatted objects are proxies to  database entries, passing arbitrary strings to __getattribute__ may be wasteful at best and a security hole at worst.


[1] http://docs.python.org/dev/py3k/library/string.html#format-string-syntax
msg117964 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 16:38
PEP 3101 has the following

"""
    Implementation note: The implementation of this proposal is
    not required to enforce the rule about a simple or dotted name
    being a valid Python identifier.  Instead, it will rely on the
    getattr function of the underlying object to throw an exception if
    the identifier is not legal.  The str.format() function will have
    a minimalist parser which only attempts to figure out when it is
    "done" with an identifier (by finding a '.' or a ']', or '}',
    etc.).
"""

Apparently CPython takes advantage of this note in its implementation.  Thus this is not a bug, but I think this implementation note should be added to CPython documentation.
msg117965 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 16:54
Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through.

I don't see this as any different from:

>>> class X:
...    def __getattribute__(self, a): return 'foo'
... 
>>> getattr(X(), '$#@')
'foo'
msg117966 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-04 16:58
2010/10/4 Eric Smith <report@bugs.python.org>:
>
> Eric Smith <eric@trueblade.com> added the comment:
>
> Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through.

You can always use "str.isidentifier()" (I don't remember if there's a capi).
msg117967 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 17:02
Ah, but I don't need to in order to comply with the PEP!
msg117969 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 17:10
On Mon, Oct 4, 2010 at 1:02 PM, Eric Smith <report@bugs.python.org> wrote:
..
> Ah, but I don't need to in order to comply with the PEP!

This is true and this is the reason I changed this issue from bug to
doc.   I seem to remember this having been discussed before, but I
cannot find the right thread.   There are at least two reasons cpython
docs should mention this:

1. From current documentation, users are likely to expect a value
error from format(".$#@", ..) rather than an attribute error.
2. Naive proxy objects may implement __getattribute__ that blindly
inserts attribute name into database queries leading to all kinds of
undesired behaviors.
msg117971 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 17:37
I agree it should be documented as a CPython specific behavior. I should also add a CPython specific test for it, modeled on your code (if one doesn't already exist). I'll look into it.
msg117992 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-10-05 07:21
> I seem to remember this having been discussed before, but I
cannot find the right thread.

It came up in the issue 7951 discussion, I think.
msg118009 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-05 13:57
This should not be classified as an "implementation detail". Either we should document it and cause other implementations to support it or check it ourselves.
msg118011 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-05 14:32
I agree that it being an implementation detail is not a good thing. I think we should just document the current CPython behavior as the language standard: once parsed, any string after a dot is passed to getattr. I can't see why we should pay the penalty of validating it as an identifier when the behavior is well defined and matches my getattr example in msg 117965.
msg118232 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-10-08 22:55
This is a bug report in that there is a discrepancy between the grammar in the doc and the behavior. Laxiness can lead to portability problems if CPython is lax compared to a normal reading of the spec and another implementation takes the spec seriously.

I agree that implementation details that lead to an exception here and not there, or vice versa, are best avoided.

For getattr:
'''
getattr(object, name[, default]) 
Return the value of the named attributed of object. name must be a string.
'''
the doc is careful to just say that name must be a string, not specifically an identifier. Given that, I suppose
"attribute_name    ::=  identifier" should be changed to match so that string formats can always (all implementations) also access non-identifier attributes.
History
Date User Action Args
2015-10-02 21:38:35belopolskysetstage: needs patch
versions: + Python 3.6, - Python 3.2
2010-10-08 22:55:18terry.reedysetnosy: + terry.reedy
messages: + msg118232
2010-10-05 14:32:45eric.smithsetmessages: + msg118011
2010-10-05 13:57:46benjamin.petersonsetmessages: + msg118009
2010-10-05 07:21:54mark.dickinsonsetmessages: + msg117992
2010-10-04 17:37:26eric.smithsetmessages: + msg117971
2010-10-04 17:10:32belopolskysetmessages: + msg117969
2010-10-04 17:02:57eric.smithsetmessages: + msg117967
2010-10-04 16:58:45benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg117966
2010-10-04 16:54:41eric.smithsetmessages: + msg117965
2010-10-04 16:46:10mark.dickinsonsetnosy: + mark.dickinson, eric.smith
2010-10-04 16:39:21belopolskysetassignee: docs@python

components: + Documentation, - Interpreter Core
nosy: + docs@python
2010-10-04 16:38:10belopolskysetmessages: + msg117964
2010-10-04 16:29:00eric.araujosetnosy: + eric.araujo
2010-10-04 16:24:04belopolskycreate