This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: format method: c presentation type broken in 2.7
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: BreamoreBoy, benjamin.peterson, doerwalter, eric.smith, ezio.melotti, francismb, jwilk, python-dev, serhiy.storchaka, terry.reedy, vstinner
Priority: high Keywords: patch

Created on 2009-11-05 16:22 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue7267.patch francismb, 2013-03-23 21:28 review
int_format_c.patch vstinner, 2013-07-02 00:34 review
int_format_c_warn.patch serhiy.storchaka, 2015-05-13 09:02 review
Messages (23)
msg94935 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-05 16:22
The c presentation type in the new format method from PEP 3101 seems to
be broken:

Python 2.6.4 (r264:75706, Oct 27 2009, 15:18:04) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> u'{0:c}'.format(256)
u'\x00'

The PEP states:

'c' - Character. Converts the integer to the corresponding Unicode
character before printing, so I would have expected this to return
u'\u0100' instead of u'\x00'.
msg94936 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-05 16:30
I'll look at it.
msg94969 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-06 14:09
This is a bug in the way ints and longs are formatted. They always do
the formatting as str, then convert to unicode. This works everywhere
except with the 'c' presentation type. I'm still trying to decide how
best to handle this.
msg94972 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-06 14:52
I'd say that a value >= 128 should generate a Unicode string (as the PEP
explicitely states that the value is a Unicode code point and not a byte
value).

However str.format() doesn't seem to support mixing str and unicode anyway:

>>> '{0}'.format(u'\u3042')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in
position 0: ordinal not in range(128)

so str.format() might raise an OverflowError for values >= 128 (or >= 256?)
msg95113 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-10 13:20
> so str.format() might raise an OverflowError for values >= 128 (or >=
256?)

Maybe, but the issue you reported is in unicode.format() (not
str.format()), and I think that should be fixed. I'm trying to think of
how best to address it.

As for the second issue you raise (which I think is that str.format()
can't take a unicode argument), would you mind opening a separate issue
for this and assigning it to me? Thanks.
msg95115 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-10 13:58
Done: issue 7300.
msg98107 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-21 11:38
See also issue #7649.
msg98173 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-23 00:46
('%c' % 255) == chr(255) == '\xff'

'%c' % 256 raise an "OverflowError: unsigned byte integer is greater than maximum" and chr(256) raise a "ValueError: chr() arg not in range(256)". I prefer the second error ;-)

str.format() should follow the same behaviour.

str is a byte string: it can be used to create a network packet or encode data into a byte stream. '%c' is useful for that, and str.format() should keep this nice feature.
msg100772 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-10 00:25
u'{0:c}'.format(256) formatter in implemented in Objects/stringlib/formatter.h and this C template is instanciated in... Python/formatter_string.c (and not Python/formatter_unicode.c). Extract of formatter_unicode.c comment:

/* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since
   we can live with only the string versions of those.  The builtin
   format() will convert them to unicode. */

format_int_or_long_internal() is instanciated (only once) with STRINGLIB_CHAR=char and so "numeric_char = (STRINGLIB_CHAR)x;" becomes "numeric_char = (char)x;" whereas x is a long in [0; 0x10ffff] (or [0; 0xffff] depending on Python unicode build option).

I think that 'c' format type should have its own function because 
format_int_or_long_internal() gets locale info, compute the number of digits, and other things not related to just creating one character from its code (chr(code) / unichr(code)). But it's just a remark, it doesn't fix this issue.

To fix this issue, I think that the FORMAT_LONG & cie templates should be instanciated twice (str & unicode).
msg185089 - (view) Author: Francis MB (francismb) * Date: 2013-03-23 20:52
In 2.7.3 >>>

>>> u'{0:c}'.format(127)
u'\x7f'

>>> u'{0:c}'.format(128)

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    u'{0:c}'.format(128)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'{0:c}'.format(255)

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    u'{0:c}'.format(255)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

>>> u'{0:c}'.format(256)
u'\x00'

>>> u'{0:c}'.format(257)
u'\x01'
msg185092 - (view) Author: Francis MB (francismb) * Date: 2013-03-23 21:28
Adding a test that triggers the issue, let me know if is enough.
msg192169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-07-02 00:34
u'{0:c}'.format(256) calls 256.__format__('c') which returns a str (bytes) object, so we must reject value outside range(0, 256). The real fix for this issue is to upgrade to Python 3.

Attached patch works around the inital issue (u'{0:c}'.format(256)) by raising OverflowError on int.__format__('c') if the value is not in range(0, 256).
msg217674 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-05-01 01:18
If the purpose of backporting .format was/is to help people writing forward-looking code, or now, to  write 2&3 code, then it should work like .format in 3.x, at lease when the format string is unicode.
msg242726 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-05-07 19:07
What if any harm can be done by applying the patch with Victor's work around?
msg242753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-08 10:33
May be just emit a warning in -3 mode?
msg243059 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-13 09:02
Here is a modification of Victor's patch, that just emits Py3k warning.

Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin?
msg254373 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 08:59
Ping.
msg254376 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 10:14
> Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin?

I prefer an OverflowError. I don't like having to enable a flag to fix a bug :-(

According to the issue title, it's really a bug: "format method: c presentation type *broken* in 2.7".

Note: The unit test may check the error message, currently the error message is irrevelant (it mentions unicode whereas bytes (str type) are used).

>>> format(-1, "c")
OverflowError: %c arg not in range(0x110000) (wide Python build)
msg254378 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 10:51
Then feel free to commit your patch please. It LGTM.
msg254379 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-09 11:22
New changeset 2f2c52c9ff38 by Victor Stinner in branch '2.7':
Issue #7267: format(int, 'c') now raises OverflowError when the argument is not
https://hg.python.org/cpython/rev/2f2c52c9ff38
msg254380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 11:23
> Then feel free to commit your patch please. It LGTM.

Thanks for the review ;-)

@Walter: Sorry for the late fix (6 years later!).
msg254383 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2015-11-09 12:38
Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;)
msg254391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 15:29
Walter Dörwald added the comment:
> Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;)

Wow, cool! We still have 1 or 2 customers stuck with Python 2, haha.
History
Date User Action Args
2022-04-11 14:56:54adminsetgithub: 51516
2015-11-09 19:16:53berker.peksagsetstage: commit review -> resolved
2015-11-09 15:29:28vstinnersetmessages: + msg254391
2015-11-09 12:38:45doerwaltersetmessages: + msg254383
2015-11-09 11:23:06vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg254380
2015-11-09 11:22:22python-devsetnosy: + python-dev
messages: + msg254379
2015-11-09 10:51:06serhiy.storchakasetmessages: + msg254378
stage: patch review -> commit review
2015-11-09 10:14:57vstinnersetmessages: + msg254376
2015-11-09 08:59:00serhiy.storchakasetmessages: + msg254373
2015-06-10 18:54:25jwilksetnosy: + jwilk
2015-05-19 09:19:01serhiy.storchakasetnosy: + benjamin.peterson
2015-05-13 09:02:11serhiy.storchakasetfiles: + int_format_c_warn.patch

messages: + msg243059
2015-05-08 10:33:20serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg242753
2015-05-07 19:07:48BreamoreBoysetnosy: + BreamoreBoy
messages: + msg242726
2014-05-01 01:35:46terry.reedysettitle: format method: c presentation type broken -> format method: c presentation type broken in 2.7
2014-05-01 01:18:37terry.reedysetnosy: + terry.reedy

messages: + msg217674
stage: needs patch -> patch review
2013-07-02 00:34:30vstinnersetfiles: + int_format_c.patch

messages: + msg192169
2013-06-23 14:57:49terry.reedysetstage: test needed -> needs patch
2013-03-23 21:28:00francismbsetfiles: + issue7267.patch
keywords: + patch
messages: + msg185092
2013-03-23 20:52:57francismbsetnosy: + francismb
messages: + msg185089
2011-11-19 14:03:05ezio.melottisetversions: - Python 2.6
2010-03-10 00:25:42vstinnersetmessages: + msg100772
2010-02-24 18:25:05eric.smithsetpriority: normal -> high
2010-02-24 18:04:15eric.smithsetpriority: normal
2010-01-23 00:46:34vstinnersetmessages: + msg98173
2010-01-21 11:38:47vstinnersetnosy: + vstinner
messages: + msg98107
2010-01-14 00:11:48ezio.melottisetnosy: + ezio.melotti

stage: test needed
2009-11-10 13:58:23doerwaltersetmessages: + msg95115
2009-11-10 13:20:17eric.smithsetmessages: + msg95113
2009-11-06 14:52:30doerwaltersetmessages: + msg94972
2009-11-06 14:09:20eric.smithsetmessages: + msg94969
versions: + Python 2.7
2009-11-05 16:30:22eric.smithsetassignee: eric.smith

messages: + msg94936
nosy: + eric.smith
2009-11-05 16:22:47doerwaltercreate