Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object.__format__ should reject format strings #52242

Closed
ericvsmith opened this issue Feb 22, 2010 · 28 comments
Closed

object.__format__ should reject format strings #52242

ericvsmith opened this issue Feb 22, 2010 · 28 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@ericvsmith
Copy link
Member

BPO 7994
Nosy @mdickinson, @ericvsmith, @ezio-melotti, @bitdancer, @florentx, @meadori
Files
  • issue7994-2.diff
  • issue7994-3.diff: Updated patch off of trunk
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2010-09-14.17:39:15.925>
    created_at = <Date 2010-02-22.21:58:57.812>
    labels = ['interpreter-core', 'type-bug']
    title = 'object.__format__ should reject format strings'
    updated_at = <Date 2014-03-20.00:54:32.208>
    user = 'https://github.com/ericvsmith'

    bugs.python.org fields:

    activity = <Date 2014-03-20.00:54:32.208>
    actor = 'hct'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2010-09-14.17:39:15.925>
    closer = 'eric.smith'
    components = ['Interpreter Core']
    creation = <Date 2010-02-22.21:58:57.812>
    creator = 'eric.smith'
    dependencies = []
    files = ['16346', '16376']
    hgrepos = []
    issue_num = 7994
    keywords = ['patch']
    message_count = 28.0
    messages = ['99847', '99916', '99917', '99943', '99948', '100135', '100139', '101921', '101943', '102162', '116271', '116290', '116350', '116414', '211042', '214034', '214040', '214130', '214132', '214154', '214156', '214157', '214158', '214159', '214160', '214161', '214162', '214163']
    nosy_count = 9.0
    nosy_names = ['mark.dickinson', 'eric.smith', 'ezio.melotti', 'r.david.murray', 'flox', 'meador.inge', 'BreamoreBoy', 'python-dev', 'hct']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7994'
    versions = ['Python 2.7', 'Python 3.2']

    @ericvsmith
    Copy link
    Member Author

    Background:

    format(obj, fmt) eventually calls object.__format__(obj, fmt) if obj (or one of its bases) does not implement __format__. The behavior of object.__format__ is basically:

    def __format__(self, fmt):
        return str(self).__format__(fmt)

    So the caller of format() thought they were passing in a format string specific to obj, but it is interpreted as a format string for str.

    This is not correct, or at least confusing. The format string is supposed to be type specific. However in this case the object is being changed (to type str), but the format string which was to be applied to its original type is now being passed to str.

    This is an actual problem that occurred in the migration from 3.0 -> 3.1 and from 2.6 -> 2.7 with complex. In the earlier versions, complex did not have a __format__ method, but it does in the latter versions. So this code:
    >>> format(1+1j, '10s')
    '(1+1j)    '
    worked in 2.6 and 3.0, but gives an error in 2.7 and 3.1:
    >>> format(1+1j, '10s')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 's' for object of type 'complex'

    Proposal:
    object.__format__ should give an error if a non-empty format string is specified. In 2.7 and 3.2 make this a PendingDeprecationWarning, in 3.3 make it a DeprecationWarning, and in 3.4 make it an error.

    Modify the documentation to make this behavior clear, and let the user know that if they want this behavior they should say:

    format(str(obj), '10s')

    or the equivalent:

    "{0!s:10}".format(obj)

    That is, the conversion to str should be explicit.

    @ericvsmith ericvsmith self-assigned this Feb 22, 2010
    @ericvsmith ericvsmith added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Feb 22, 2010
    @ericvsmith
    Copy link
    Member Author

    Proposed patch attached. I need to add tests and docs.

    @ericvsmith
    Copy link
    Member Author

    bpo-7994-0.diff is against trunk.

    @ericvsmith
    Copy link
    Member Author

    This version of the patch adds support for classic classes and adds tests. Documentation still needs to be written.

    Again, this diff is against trunk.

    If anyone wants to review this, in particular the tests that exercise PendingDeprecationWarning, that would be great.

    @ericvsmith ericvsmith removed the easy label Feb 23, 2010
    @ericvsmith
    Copy link
    Member Author

    Patch with Misc/NEWS.

    @meadori
    Copy link
    Member

    meadori commented Feb 26, 2010

    The patch looks reasonable. I built on it with the following changes:

    1. Added some extra test cases to cover Unicode format strings,
      since the code was changed to handle these as well.
    2. Changed test_builtin.py by
      s/m[0].message.message/str(w[0].message)/, since
      BaseException.message was deprecated in 2.6.

    I also have the following general comments:

    1. PEP-3101 explicitly defines the string conversion for
      object.__format__. What is the rationale behind this? Should
      we find out before making this change?
    2. I don't think the comments in 'abstract.c' and 'typeobject.c'
      explaining that the warning will eventually become an error are
      needed. I think it would be better to open separate issues for
      these migration steps as they can be tracked easier and will be
      more visible.
    3. test_unicode, test_str have cases that trigger the added
      warning. Should they be altered now or when (if) this becomes
      an error?

    @ericvsmith
    Copy link
    Member Author

    I haven't looked at the patch, but:

    Thanks for the the additional tests. Missing unicode was definitely a mistake.

    str(w[0].message) is an improvement.

    The PEP is out of date in many respects. I think it's best to note that in the PEP and continue to keep the documentation up-to-date.

    This issue already applies to 3.3, but my plan is to remove that and create a new issue when I close this one. But I'd still like to leave the comments in place.

    I'm aware of the existing tests which trigger the warning. I think they should probably be removed, although I haven't really spent much time thinking about it.

    @ericvsmith
    Copy link
    Member Author

    Meador: Your patch (-3) looks identical to mine (-2), unless I'm making some mistake. Could you check? I'd like to get this applied in the next few days, before 2.7b1.

    Thanks!

    @meadori
    Copy link
    Member

    meadori commented Mar 30, 2010

    Hi Eric,

    (-2) and (-3) are different. The changes that I made, however, are pretty minor. Also, they are all in 'test_builtin.py'.

    @ericvsmith
    Copy link
    Member Author

    Committed in trunk in r79596. I'll leave this open until I port to py3k, check the old tests for this usage, and create the issue to make it a DeprecationWarning.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Sep 13, 2010

    This should be merged before 3.2 beta.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Sep 13, 2010

    now the PendingDeprecationWarnings are checked in the test suite, with r84772 (for 2.7).

    @ericvsmith
    Copy link
    Member Author

    Manually merged to py3k in r84790. I'll leave this open until I create the 3.3 issue to change it to a DeprecationWarning.

    @ericvsmith
    Copy link
    Member Author

    See bpo-9856 for changing this to a DeprecationWarning in 3.3.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 11, 2014

    New changeset f56b98143792 by R David Murray in branch 'default':
    whatsnew: object.__format__ raises TypeError on non-empty string.
    http://hg.python.org/cpython/rev/f56b98143792

    @hct
    Copy link
    Mannequin

    hct mannequin commented Mar 18, 2014

    just found out about this change in the latest official stable release and it's breaking my code all over the place. something like "{:s}".format( self.pc ) used to work in 3.3.4 and prior releases now raise exception rather then return a string 'None' when self.pc was never update to not None (was initialized to None during object init). this means I have to manually go and change every single line that expects smooth formatting to a check to see if the variable is still a 'NoneType'.

    should we just create a format for None, alias string format to repr/str on classes without format implementation or put more thought into this

    @ericvsmith
    Copy link
    Member Author

    I think the best we could do is have None.__format__ be:

    def __format__(self, fmt):
       return str(self).__format__(fmt)

    Or its logical equivalent.

    But this seems more like papering over a bug, instead of actually fixing a problem. My suggestion is to use:
    "{!s}".format(None)
    That is: if you want to format a string, then explicitly force the argument to be a string.

    I don't think None should be special and be auto-converted to a string.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Mar 19, 2014

    I use lots of complicated format such as the following
    "{:{:s}{:d}s}".format( self.pcs,self.format_align, self.max_length )

    it looks like the way to do it from now on will be
    "{!s:{:s}{:d}}".format( self.pcs,self.format_align, self.max_length )

    @ericvsmith
    Copy link
    Member Author

    Or:

    "{:{:s}{:d}s}".format(str(self.pcs), self.format_align, self.max_length)

    You're trying to apply the string format specifier (the stuff after the first colon through the final "s", as expanded) to an object that's not always a string: sometimes it's None. So you need to use one of the two supported ways to convert it to a string. Either str() or !s.

    str.format() is very much dependent on the types of its arguments: the format specifier needs to be understood by the object being formatted. Similarly, you couldn't pass in a datetime and expect that to work, either.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Mar 19, 2014

    unlike NoneType, datetime doesn't throw exception. is returning the format specifier the intended behaviour of this fix?

    >>> import datetime
    >>> a=datetime.datetime(1999,7,7)
    >>> str(a)
    '1999-07-07 00:00:00'
    >>> "{:s}".format(a)
    's'
    >>> "{:7s}".format(a)
    '7s'
    >>> "{!s}".format(a)
    '1999-07-07 00:00:00'
    >>>

    @bitdancer
    Copy link
    Member

    Yes. It is not returning the format specifier, it is filling in the strftime template "s" from the datetime...which equals "s", since it consists of just that constant string.

    Try {:%Y-%m-%d}, for example.

    @bitdancer
    Copy link
    Member

    Which, by the way, has been the behavior all along, it is not something affected by this fix, because datetime *does* have a __format__ method.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Mar 20, 2014

    None does have __format__, but it raises exception

    >>> dir(None)
    ['__bool__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
    
    >>> None.__format__
    <built-in method __format__ of NoneType object at 0x50BB2760>

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 20, 2014

    That's not an exception, you've not actually called the function.

    >>> None.__format__('')
    'None'

    @ericvsmith
    Copy link
    Member Author

    David is correct.

    It's often easiest to think about the builtin format() instead of str.format(). Notice below that the format specifier has to make sense for the object being formatted:

    >> import datetime
    >> now = datetime.datetime.now()

    >>> format('somestring', '.12s')
    'somestring  '
    
    # "works", but not what you want because it calls now.strftime('.12s'):
    >>> format(now, '.12s')
    '.12s'
    
    # better:
    >>> format(now, '%Y-%m-%d')  # better
    '2014-03-19'
    
    # int doesn't know what '.12s' format spec means:
    >>> format(3, '.12s')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 's' for object of type 'int'
    
    # None doesn't have an __format__, so object.__format__ rejects it:
    >>> format(None, '.12s')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: non-empty format string passed to object.__format__
    
    # just like a random class doesn't have an __format__:
    >>> class F: pass
    ... 
    >>> format(F(), '.12s')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: non-empty format string passed to object.__format__

    Tangentially related:

    The best you can do here, given your use case, is to argue that None needs an __format__ that understands str's format specifiers, because you like to mix str and None. But maybe someone else likes to mix int and None. Maybe None should understand int's format specifiers, and not str's:

    >>> format(42000, ',d')
    '42,000'
    >>> format('42000', ',d')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Unknown format code 'd' for object of type 'str'

    Why would "format(None, '.12s')" make any more sense than "format(None, ',d')"? Since we can't guess, we chose an error.

    @bitdancer
    Copy link
    Member

    NoneType is a subclass of object.

    >>> class Foo(object):
    ...    pass
    ... 
    >>> f = Foo()
    >>> f.__format__
    <built-in method __format__ of Foo object at 0xb71543b4>

    ie: the exception is being raised by object.__format__, as provided for by this issue.

    @ericvsmith
    Copy link
    Member Author

    BreamoreBoy:

    This is basically the definition of object.__format__:

    def __format__(self, specifier):
      if len(specifier) == 0:
        return str(self)
      raise TypeError('non-empty format string passed to object.__format__')

    Which is why it works for an empty specifier.

    As a reminder, the point of raising this type error is described in the first message posted in this bug. This caused us an actual problem when we implemented complex.__format__, and I don't see object.__format__ changing.

    Implementing NoneType.__format__ and having it understand some string specifiers would be possible, but I'm against it, for reasons I hope I've made clear.

    As to why None.__format__ appears to be implemented, it's the same as this:

    >>> class Foo: pass
    ... 
    >>> Foo().__format__
    <built-in method __format__ of Foo object at 0xb74e6a4c>

    That's really object.__format__, bound to a Foo instance.

    @hct
    Copy link
    Mannequin

    hct mannequin commented Mar 20, 2014

    I think was confused as I forgot that I was doing str.format where {} being format of str. confusion cleared

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants