msg94935 - (view) |
Author: Walter Dörwald (doerwalter) * |
Date: 2009-11-05 16:22 |
The c presentation type in the new format method from PEP 3101 seems to
be broken:
Python 2.6.4 (r264:75706, Oct 27 2009, 15:18:04)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> u'{0:c}'.format(256)
u'\x00'
The PEP states:
'c' - Character. Converts the integer to the corresponding Unicode
character before printing, so I would have expected this to return
u'\u0100' instead of u'\x00'.
|
msg94936 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2009-11-05 16:30 |
I'll look at it.
|
msg94969 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2009-11-06 14:09 |
This is a bug in the way ints and longs are formatted. They always do
the formatting as str, then convert to unicode. This works everywhere
except with the 'c' presentation type. I'm still trying to decide how
best to handle this.
|
msg94972 - (view) |
Author: Walter Dörwald (doerwalter) * |
Date: 2009-11-06 14:52 |
I'd say that a value >= 128 should generate a Unicode string (as the PEP
explicitely states that the value is a Unicode code point and not a byte
value).
However str.format() doesn't seem to support mixing str and unicode anyway:
>>> '{0}'.format(u'\u3042')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in
position 0: ordinal not in range(128)
so str.format() might raise an OverflowError for values >= 128 (or >= 256?)
|
msg95113 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2009-11-10 13:20 |
> so str.format() might raise an OverflowError for values >= 128 (or >=
256?)
Maybe, but the issue you reported is in unicode.format() (not
str.format()), and I think that should be fixed. I'm trying to think of
how best to address it.
As for the second issue you raise (which I think is that str.format()
can't take a unicode argument), would you mind opening a separate issue
for this and assigning it to me? Thanks.
|
msg95115 - (view) |
Author: Walter Dörwald (doerwalter) * |
Date: 2009-11-10 13:58 |
Done: issue 7300.
|
msg98107 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-01-21 11:38 |
See also issue #7649.
|
msg98173 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-01-23 00:46 |
('%c' % 255) == chr(255) == '\xff'
'%c' % 256 raise an "OverflowError: unsigned byte integer is greater than maximum" and chr(256) raise a "ValueError: chr() arg not in range(256)". I prefer the second error ;-)
str.format() should follow the same behaviour.
str is a byte string: it can be used to create a network packet or encode data into a byte stream. '%c' is useful for that, and str.format() should keep this nice feature.
|
msg100772 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-03-10 00:25 |
u'{0:c}'.format(256) formatter in implemented in Objects/stringlib/formatter.h and this C template is instanciated in... Python/formatter_string.c (and not Python/formatter_unicode.c). Extract of formatter_unicode.c comment:
/* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since
we can live with only the string versions of those. The builtin
format() will convert them to unicode. */
format_int_or_long_internal() is instanciated (only once) with STRINGLIB_CHAR=char and so "numeric_char = (STRINGLIB_CHAR)x;" becomes "numeric_char = (char)x;" whereas x is a long in [0; 0x10ffff] (or [0; 0xffff] depending on Python unicode build option).
I think that 'c' format type should have its own function because
format_int_or_long_internal() gets locale info, compute the number of digits, and other things not related to just creating one character from its code (chr(code) / unichr(code)). But it's just a remark, it doesn't fix this issue.
To fix this issue, I think that the FORMAT_LONG & cie templates should be instanciated twice (str & unicode).
|
msg185089 - (view) |
Author: Francis MB (francismb) * |
Date: 2013-03-23 20:52 |
In 2.7.3 >>>
>>> u'{0:c}'.format(127)
u'\x7f'
>>> u'{0:c}'.format(128)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
u'{0:c}'.format(128)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
>>> u'{0:c}'.format(255)
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
u'{0:c}'.format(255)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
>>> u'{0:c}'.format(256)
u'\x00'
>>> u'{0:c}'.format(257)
u'\x01'
|
msg185092 - (view) |
Author: Francis MB (francismb) * |
Date: 2013-03-23 21:28 |
Adding a test that triggers the issue, let me know if is enough.
|
msg192169 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-07-02 00:34 |
u'{0:c}'.format(256) calls 256.__format__('c') which returns a str (bytes) object, so we must reject value outside range(0, 256). The real fix for this issue is to upgrade to Python 3.
Attached patch works around the inital issue (u'{0:c}'.format(256)) by raising OverflowError on int.__format__('c') if the value is not in range(0, 256).
|
msg217674 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2014-05-01 01:18 |
If the purpose of backporting .format was/is to help people writing forward-looking code, or now, to write 2&3 code, then it should work like .format in 3.x, at lease when the format string is unicode.
|
msg242726 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-05-07 19:07 |
What if any harm can be done by applying the patch with Victor's work around?
|
msg242753 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-05-08 10:33 |
May be just emit a warning in -3 mode?
|
msg243059 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-05-13 09:02 |
Here is a modification of Victor's patch, that just emits Py3k warning.
Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin?
|
msg254373 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-11-09 08:59 |
Ping.
|
msg254376 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2015-11-09 10:14 |
> Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin?
I prefer an OverflowError. I don't like having to enable a flag to fix a bug :-(
According to the issue title, it's really a bug: "format method: c presentation type *broken* in 2.7".
Note: The unit test may check the error message, currently the error message is irrevelant (it mentions unicode whereas bytes (str type) are used).
>>> format(-1, "c")
OverflowError: %c arg not in range(0x110000) (wide Python build)
|
msg254378 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-11-09 10:51 |
Then feel free to commit your patch please. It LGTM.
|
msg254379 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-11-09 11:22 |
New changeset 2f2c52c9ff38 by Victor Stinner in branch '2.7':
Issue #7267: format(int, 'c') now raises OverflowError when the argument is not
https://hg.python.org/cpython/rev/2f2c52c9ff38
|
msg254380 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2015-11-09 11:23 |
> Then feel free to commit your patch please. It LGTM.
Thanks for the review ;-)
@Walter: Sorry for the late fix (6 years later!).
|
msg254383 - (view) |
Author: Walter Dörwald (doerwalter) * |
Date: 2015-11-09 12:38 |
Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;)
|
msg254391 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2015-11-09 15:29 |
Walter Dörwald added the comment:
> Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;)
Wow, cool! We still have 1 or 2 customers stuck with Python 2, haha.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:54 | admin | set | github: 51516 |
2015-11-09 19:16:53 | berker.peksag | set | stage: commit review -> resolved |
2015-11-09 15:29:28 | vstinner | set | messages:
+ msg254391 |
2015-11-09 12:38:45 | doerwalter | set | messages:
+ msg254383 |
2015-11-09 11:23:06 | vstinner | set | status: open -> closed resolution: fixed messages:
+ msg254380
|
2015-11-09 11:22:22 | python-dev | set | nosy:
+ python-dev messages:
+ msg254379
|
2015-11-09 10:51:06 | serhiy.storchaka | set | messages:
+ msg254378 stage: patch review -> commit review |
2015-11-09 10:14:57 | vstinner | set | messages:
+ msg254376 |
2015-11-09 08:59:00 | serhiy.storchaka | set | messages:
+ msg254373 |
2015-06-10 18:54:25 | jwilk | set | nosy:
+ jwilk
|
2015-05-19 09:19:01 | serhiy.storchaka | set | nosy:
+ benjamin.peterson
|
2015-05-13 09:02:11 | serhiy.storchaka | set | files:
+ int_format_c_warn.patch
messages:
+ msg243059 |
2015-05-08 10:33:20 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg242753
|
2015-05-07 19:07:48 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg242726
|
2014-05-01 01:35:46 | terry.reedy | set | title: format method: c presentation type broken -> format method: c presentation type broken in 2.7 |
2014-05-01 01:18:37 | terry.reedy | set | nosy:
+ terry.reedy
messages:
+ msg217674 stage: needs patch -> patch review |
2013-07-02 00:34:30 | vstinner | set | files:
+ int_format_c.patch
messages:
+ msg192169 |
2013-06-23 14:57:49 | terry.reedy | set | stage: test needed -> needs patch |
2013-03-23 21:28:00 | francismb | set | files:
+ issue7267.patch keywords:
+ patch messages:
+ msg185092
|
2013-03-23 20:52:57 | francismb | set | nosy:
+ francismb messages:
+ msg185089
|
2011-11-19 14:03:05 | ezio.melotti | set | versions:
- Python 2.6 |
2010-03-10 00:25:42 | vstinner | set | messages:
+ msg100772 |
2010-02-24 18:25:05 | eric.smith | set | priority: normal -> high |
2010-02-24 18:04:15 | eric.smith | set | priority: normal |
2010-01-23 00:46:34 | vstinner | set | messages:
+ msg98173 |
2010-01-21 11:38:47 | vstinner | set | nosy:
+ vstinner messages:
+ msg98107
|
2010-01-14 00:11:48 | ezio.melotti | set | nosy:
+ ezio.melotti
stage: test needed |
2009-11-10 13:58:23 | doerwalter | set | messages:
+ msg95115 |
2009-11-10 13:20:17 | eric.smith | set | messages:
+ msg95113 |
2009-11-06 14:52:30 | doerwalter | set | messages:
+ msg94972 |
2009-11-06 14:09:20 | eric.smith | set | messages:
+ msg94969 versions:
+ Python 2.7 |
2009-11-05 16:30:22 | eric.smith | set | assignee: eric.smith
messages:
+ msg94936 nosy:
+ eric.smith |
2009-11-05 16:22:47 | doerwalter | create | |