Message 137601 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Ben.Wolfson
Recipients	Ben.Wolfson, eric.araujo, eric.smith, mark.dickinson, petri.lehtinen, r.david.murray
Date	2011-06-03.22:17:15
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1307139436.39.0.472744217827.issue12014@psf.upfronthosting.co.za>
In-reply-to

Content
str.format doesn't intermingle character data and markup. The PEP is quite clear about the terms in this case, at least: the argument to str.format consists of character data (passed through unchanged) and markup (processed). That's what it means to say that "Character data is data which is transferred unchanged from the format string to the output string". In "My name is {0}", "My name is " is transferred unchanged from the format string to the output string when the string is formatted. We're talking about how the markup is defined. """ The current implementation of str.format() finds matched pairs of braces and call what's inside "markup", then parse that markup. """ This is false, as I demonstrated. >>> d = {"{0}": "spam"} >>> # a matched pair of braces. What's inside is considered markup. ... >>> "{0}".format(d) "{'{0}': 'spam'}" >>> # a matched pair of braces. Inside is a matched pair of braces, and what's inside of that is not considered markup. ... >>> "{0[{0}]}".format(d) 'spam' >>> """ It's also true that other interpretations of the PEP are possible. I'm just not sure the benefit to be gained justifies changing all of the extant str.format() implementations, in addition to explaining the different behavior. """ Well, the beauty of it is, you wouldn't have to explain the different behavior, because the patch makes it the case that the explanation already in the documentation is correct. It is currently not correct. That's why I found out about this current state of affairs: I read the documentation's explanation and believed it, and only after digging into the code did I understand the actual behavior. It is also not a difficult change to make, would be backwards-compatible (anyway I rather doubt anyone was relying on a "{0[:]}".format(whatever) raising an exception [1]), and relaxes a restriction that is not well motivated by the text of the PEP, is not consistently applied in the implementation (see above), and is confusing and limits the usefulness of the format method. It is true that I don't know where else, beyond the implementation in string_format.h, modifications would need to be made, but I'd be willing to undertake the task. [1] and given that the present implementation does that, it's already noncompliant with the PEP, regardless of what one makes of curly braces.

str.format doesn't intermingle character data and markup. The PEP is quite clear about the terms in this case, at least: the *argument* to str.format consists of character data (passed through unchanged) and markup (processed). That's what it means to say that "Character data is data which is transferred unchanged from the format string to the output string". In "My name is {0}", "My name is " is transferred unchanged from the format string to the output string when the string is formatted. We're talking about how the *markup* is defined.

"""
The current implementation of str.format() finds matched pairs of braces and call what's inside "markup", then parse that markup.
"""

This is false, as I demonstrated.

>>> d = {"{0}": "spam"}
>>> # a matched pair of braces. What's inside is considered markup.
... 
>>> "{0}".format(d)
"{'{0}': 'spam'}"
>>> # a matched pair of braces. Inside is a matched pair of braces, and what's inside of that is not considered markup.
... 
>>> "{0[{0}]}".format(d)
'spam'
>>> 

"""
It's also true that other interpretations of the PEP are possible. I'm just not sure the benefit to be gained justifies changing all of the extant str.format() implementations, in addition to explaining the different behavior.
"""

Well, the beauty of it is, you wouldn't have to explain the different behavior, because the patch makes it the case that the explanation already in the documentation is correct. It is currently not correct. That's why I found out about this current state of affairs: I read the documentation's explanation and believed it, and only after digging into the code did I understand the actual behavior.

It is also not a difficult change to make, would be backwards-compatible (anyway I rather doubt anyone was relying on a "{0[:]}".format(whatever) raising an exception [1]), and relaxes a restriction that is not well motivated by the text of the PEP, is not consistently applied in the implementation (see above), and is confusing and limits the usefulness of the format method. It is true that I don't know where else, beyond the implementation in string_format.h, modifications would need to be made, but I'd be willing to undertake the task.

[1] and given that the present implementation does that, it's already noncompliant with the PEP, regardless of what one makes of curly braces.

History
Date	User	Action	Args
2011-06-03 22:17:16	Ben.Wolfson	set	recipients: + Ben.Wolfson, mark.dickinson, eric.smith, eric.araujo, r.david.murray, petri.lehtinen
2011-06-03 22:17:16	Ben.Wolfson	set	messageid: <1307139436.39.0.472744217827.issue12014@psf.upfronthosting.co.za>
2011-06-03 22:17:15	Ben.Wolfson	link	issue12014 messages
2011-06-03 22:17:15	Ben.Wolfson	create