Issue 9196: Improve docs for string interpolation "%s" re Unicode strings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53442

classification

Title:	Improve docs for string interpolation "%s" re Unicode strings
Type:		Stage:	resolved
Components:	Documentation	Versions:	Python 2.7

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	Arfrever, cmcqueen1975, docs@python, eric.araujo, eric.smith, ezio.melotti, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2010-07-08 07:07 by cmcqueen1975, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
class_str_unicode_methods.py	cmcqueen1975, 2011-01-10 05:26

Messages (8)
msg109516 - (view)	Author: Craig McQueen (cmcqueen1975)	Date: 2010-07-08 07:07
I have just been trying to figure out how string interpolation works for "%s", when Unicode strings are involved. It seems it's a bit complicated, but the Python documentation doesn't really describe it. It just says %s "converts any Python object using str()". Here is what I have found (I think), and it could be worth improving the documentation of this somehow. Example 1: "%s" % test_object From what I can tell, in this case: 1. test_object.__str__() is called. 2. If test_object.__str__() returns a string object, then that is substituted. 3. If test_object.__str__() returns a Unicode object (for some reason), then test_object.__unicode__() is called, then _that_ is substituted instead. The output string is turned into Unicode. This behaviour is surprising. [Note that the call to test_object.__str__() is not the same as str(test_object), because the former can return a Unicode object without causing an error, while the latter, if it gets a Unicode object, will then try to encode('ascii') to a string, possibly generating a UnicodeEncodeError exception.] Example 2: u"%s" % test_object In this case: 1. test_object.__unicode__() is called, if it exists, and the result is substituted. The output string is Unicode. 2. If test_object.__unicode__() doesn't exist, then test_object.__str__() is called instead, converted to Unicode, and substituted. The output string is Unicode. Example 3: "%s %s" % (u'unicode', test_object) In this case: 1. The first substitution causes the output string to be Unicode. 2. It seems that (1) causes the second substitution to follow the same rules as Example 2. This is a little surprising.
msg109517 - (view)	Author: Craig McQueen (cmcqueen1975)	Date: 2010-07-08 07:15
Another thing I discovered, for Example 1: 4. If test_object.__str__() returns a Unicode object (for some reason), and test_object.__unicode__() does not exist, then the Unicode value from the __str__() call is used as-is (no conversion to string, no encoding errors). This is also a little surprising [in this situation unicode(test_object) also returns the Unicode object returned by __str__() as-is, so I guess there's some consistency there].
msg124662 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-12-26 02:52
I’m not sure how much effort should be put into a patch here, considering that the horrible bytes/text confusion and implicit conversion should stop in Python 3, and %-formatting is mildly deprecated. Ezio, what do you think? Craig, could you attach your test_object class and test code? I wonder if the mixed behavior is still present in 3.x.
msg124664 - (view)	Author: Craig McQueen (cmcqueen1975)	Date: 2010-12-26 10:49
I should be able to attach my test code. But it is at my work, and I'm on holidays for 2 more weeks. Sorry 'bout that! I do assume that Python 3 greatly simplifies this.
msg125880 - (view)	Author: Craig McQueen (cmcqueen1975)	Date: 2011-01-10 05:26
I'm attaching a file that I used (in Python 2.x). It's a little rough--I manually commented and uncommented various lines to see what would change under various circumstances. But at least you should be able to see what I was doing.
msg126688 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-01-21 03:47
Python 3 checks the return types of __bytes__ and __str__, raising an error if it's not bytes and str respectively: >>> str(C()) TypeError: __str__ returned non-string (type bytes) >>> bytes(C()) TypeError: __bytes__ returned non-bytes (type str) The Python 2 doc for unicode() says[0]: """ For objects which provide a __unicode__() method, it will call this method without arguments to create a Unicode string. For all other objects, the 8-bit string version or representation is requested and then converted to a Unicode string using the codec for the default encoding in 'strict' mode. """ The doc for .__unicode__() says[1]: """ Called to implement unicode() built-in; should return a Unicode object. When this method is not defined, string conversion is attempted, and the result of string conversion is converted to Unicode using the system default encoding. """ This is consistent with unicode() doc (but it doesn't mention that 'strict' is used). It also says that the method should return unicode, but it can also returns a str that gets coerced by unicode(). The doc for .__str__() says[2]: """ Called by the str() built-in function and by the print statement to compute the “informal” string representation of an object. [...] The return value must be a string object. """ This is wrong because the return value can be unicode too (this has been changed at some point, it used to be true on older versions). That said, some of the behaviors described by Craig (e.g. __str__ that returns unicode) are not documented and documenting them might save some confusion. However these "weird" behaviors are most likely errors and the fact that there are no exception is just because Python 2 is not strict with str/unicode. I think a better way to solve the problem is to document clearly how these methods should be used (i.e. if __unicode__ should be preferred over __str__, if it's necessary to implement both, what they should return, etc.). [0]: http://docs.python.org/library/functions.html#unicode [1]: http://docs.python.org/reference/datamodel.html#object.__unicode__ [2]: http://docs.python.org/reference/datamodel.html#object.__str__
msg148563 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-11-29 13:41
More info on this thread: http://mail.python.org/pipermail/python-dev/2006-December/070237.html
msg370437 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-05-31 13:09
Python 2.7 is no longer supported.

History
Date	User	Action	Args
2022-04-11 14:57:03	admin	set	github: 53442
2020-05-31 13:09:11	serhiy.storchaka	set	status: open -> closed nosy: + serhiy.storchaka messages: + msg370437 resolution: out of date stage: needs patch -> resolved
2014-06-02 18:19:11	ezio.melotti	set	nosy: + eric.smith
2011-11-29 13:41:13	eric.araujo	set	messages: + msg148563
2011-04-22 00:21:16	Arfrever	set	nosy: + Arfrever
2011-01-21 03:47:30	ezio.melotti	set	nosy: ezio.melotti, eric.araujo, cmcqueen1975, docs@python messages: + msg126688
2011-01-10 05:26:49	cmcqueen1975	set	files: + class_str_unicode_methods.py nosy: ezio.melotti, eric.araujo, cmcqueen1975, docs@python messages: + msg125880
2010-12-26 10:49:14	cmcqueen1975	set	nosy: ezio.melotti, eric.araujo, cmcqueen1975, docs@python messages: + msg124664
2010-12-26 02:52:14	eric.araujo	set	nosy: + eric.araujo messages: + msg124662
2010-07-08 07:15:45	cmcqueen1975	set	messages: + msg109517
2010-07-08 07:09:09	ezio.melotti	set	nosy: + ezio.melotti stage: needs patch
2010-07-08 07:07:08	cmcqueen1975	create