Title: format(value) and value.__format__() behave differently with unicode format
Type: behavior Stage: patch review
Components: Documentation Versions: Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Arfrever, chris.jerdonek, docs@python, eric.smith, ezio.melotti
Priority: normal Keywords: patch

Created on 2012-09-16 19:59 by chris.jerdonek, last changed 2012-09-23 11:15 by ezio.melotti.

File name Uploaded Description Edit
issue-15952-1-branch-27.patch chris.jerdonek, 2012-09-18 19:39
Messages (6)
msg170575 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-09-16 19:59
format(value) and value.__format__() behave differently even though the documentation says otherwise:

"Note: format(value, format_spec) merely calls value.__format__(format_spec)."

(from )

The difference happens when the format string is unicode.  For example:

>>> format(10, u'n')
>>> (10).__format__(u'n')  # parentheses needed to prevent SyntaxError

So either the documentation should be changed, or the behavior should be changed to match.

Related to this: neither the "Format Specification Mini-Language" documentation nor the string.Formatter docs seem to say anything about the effect that a unicode format string should have on the return value (in particular, should it cause the return value to be unicode or not):

See also issue 15276 (int formatting), issue 15951 (empty format string), and issue 7300 (unicode arguments).
msg170587 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-09-17 06:26
See this code comment:

/* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since
   we can live with only the string versions of those.  The builtin
   format() will convert them to unicode. */


In other words, it was deliberate not to make value.__format__(format_spec) return unicode when format_spec is unicode.  So the docs should be adjusted to say that they are not always the same.
msg170603 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2012-09-17 12:46
I believe the conversion is happening in Objects/abstract.c in PyObject_Format, around line 864, near this comment:

    /* Convert to unicode, if needed.  Required if spec is unicode
       and result is str */

I think changing the docs will result in more confusion than clarity, but if you can come up with some good wording, I'd be okay with it. I think changing the code will likely break things with little or no benefit.
msg170669 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-09-18 19:39
Here is a proposed patch.

One note on the patch.  I feel the second sentence of the note is worth adding because value.__format__() departs from what PEP 3101 says:

"Note for Python 2.x: The 'format_spec' argument will be either
a string object or a unicode object, depending on the type of the
original format string.  The __format__ method should test the type
of the specifiers parameter to determine whether to return a string or
unicode object.  It is the responsibility of the __format__ method
to return an object of the proper type."

The extra sentence will help in heading off and when responding to issues about value.__format__() that are similar to issue 15951.
msg170671 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-09-18 19:44
To clarify, one of the sentences above should have read, "I feel the second sentence of the note *in the patch* was worth adding..." (not the second sentence of the PEP note I quoted).
msg171026 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-09-23 11:15
``format(value, format_spec)`` merely calls
-      ``value.__format__(format_spec)``.
+      ``value.__format__(format_spec)`` and, if *format_spec* is Unicode,
+      converts the value to Unicode if it is not already Unicode.

This is correct, but should be rephrased (and "value" should be "return value").

+      The method ``value.__format__(format_spec)`` may return 8-bit strings
+      for some built-in types when *format_spec* is Unicode.

This is not limited to built-in types.  __format__() might return either str or unicode, and format() returns the same -- except for the aforementioned case.

This is a summary of the possible cases.

__format__ can return unicode or str:

  >>> class Uni(object):
  ...   def __format__(*args): return u'uni'
  >>> class Str(object):
  ...   def __format__(*args): return 'str'

format() and __format__ return the same value, except when the format_spec is unicode and __format__ returns str:

  >>> format(Uni(),  'd'),  Uni().__format__( 'd')  # same
  (u'uni', u'uni')
  >>> format(Uni(), u'd'),  Uni().__format__(u'd')  # same
  (u'uni', u'uni')
  >>> format(Str(),  'd'),  Str().__format__( 'd')  # same
  ('str', 'str')
  >>> format(Str(), u'd'),  Str().__format__(u'd')  # different
  (u'str', 'str')

It is also not true that the type of return value is the same of the format_spec, because in the first case the returned type is unicode even if the format_spec is str.  Therefore this part of the patch should be changed:

+   Per :pep:`3101`, the function returns a Unicode object if *format_spec* is
+   Unicode.  Otherwise, it returns an 8-bit string.

The behavior might be against PEP 3101 (see quotation in msg170669), even thought the wording of the PEP is somewhat lenient IMHO ("proper type" doesn't necessary mean "same type").
Date User Action Args
2012-09-23 11:15:10ezio.melottisetmessages: + msg171026
2012-09-22 18:32:04chris.jerdoneklinkissue15276 dependencies
2012-09-22 14:06:16chris.jerdoneksetnosy: + ezio.melotti
2012-09-18 19:44:29chris.jerdoneksetmessages: + msg170671
2012-09-18 19:39:25chris.jerdoneksetfiles: + issue-15952-1-branch-27.patch
keywords: + patch
messages: + msg170669

stage: patch review
2012-09-17 12:46:45eric.smithsetnosy: + eric.smith
messages: + msg170603
2012-09-17 06:26:10chris.jerdoneksetmessages: + msg170587
2012-09-16 20:01:37Arfreversetnosy: + Arfrever
2012-09-16 19:59:51chris.jerdonekcreate