%c, %o, %x, %X accept non-integer values instead of raising an exception #64194

ethanfurman · 2013-12-16T10:29:04Z

BPO	19995
Nosy	@gvanrossum, @warsaw, @rhettinger, @terryjreedy, @mdickinson, @pitrou, @vstinner, @larryhastings, @ericvsmith, @ezio-melotti, @bitdancer, @skrah, @ethanfurman, @serhiy-storchaka, @MojoVampire
Files	issue19995.stoneleaf.01.patch issue19995.stoneleaf.02.patch issue19995.stoneleaf.03.patch issue19995.stoneleaf.04.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ethanfurman'
closed_at = <Date 2014-03-21.20:48:10.612>
created_at = <Date 2013-12-16.10:29:04.037>
labels = ['type-bug']
title = '%c, %o, %x, %X accept non-integer values instead of raising an exception'
updated_at = <Date 2015-06-22.19:46:27.672>
user = 'https://github.com/ethanfurman'

bugs.python.org fields:

activity = <Date 2015-06-22.19:46:27.672>
actor = 'barry'
assignee = 'ethan.furman'
closed = True
closed_date = <Date 2014-03-21.20:48:10.612>
closer = 'ethan.furman'
components = []
creation = <Date 2013-12-16.10:29:04.037>
creator = 'ethan.furman'
dependencies = []
files = ['33284', '33286', '33289', '33414']
hgrepos = []
issue_num = 19995
keywords = ['patch']
message_count = 66.0
messages = ['206290', '206293', '206303', '206321', '206322', '206323', '206325', '206326', '206331', '206332', '206333', '206336', '206338', '206339', '206341', '206363', '206364', '206367', '206368', '206369', '206371', '206378', '206379', '206381', '206382', '206383', '206409', '206465', '206467', '206715', '206858', '207049', '207069', '207098', '207099', '207100', '207101', '207114', '207115', '207117', '207303', '207378', '207379', '207380', '207382', '207383', '207384', '207385', '207386', '207387', '207389', '207393', '207835', '207895', '207929', '207940', '207966', '208099', '210417', '214104', '214149', '214155', '214365', '214411', '245643', '245644']
nosy_count = 17.0
nosy_names = ['gvanrossum', 'barry', 'rhettinger', 'terry.reedy', 'mark.dickinson', 'pitrou', 'vstinner', 'larry', 'eric.smith', 'ezio.melotti', 'Arfrever', 'r.david.murray', 'skrah', 'ethan.furman', 'python-dev', 'serhiy.storchaka', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue19995'
versions = ['Python 3.4', 'Python 3.5']

ethanfurman · 2013-12-16T10:29:04Z

Using Enum to illustrate:

--> class Grade(enum.Enum):
...   A = 4
...   B = 3
...   C = 2
...   D = 1
...   F = 0
...   def __index__(self):
...     return self._value_

--> ['miserable'][Grade.F]
'miserable'

    --> '%x' % Grade.F
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: %x format: a number is required, not Grade

--> hex(Grade.F)
'0x0'

I suggest that hex() and oct() have the same check that %x and %o do so that non-numbers are not representable as hex and octal.

While we're at it, we should do the same for bin(). Are there any others?

I'll create a patch once we have a decision on which way to solve this issue.

vstinner · 2013-12-16T10:44:42Z

Calls:

hex()/oct() => PyNumber_ToBase() => PyNumber_Index().
PyUnicode_Format() => mainformatlong() => PyNumber_Long()

I never understood the difference between "long" (int method) and "index" (index method). Is the difference on the behaviour of floating point numbers?

ericvsmith · 2013-12-16T13:11:39Z

It seems to me that by giving it an __index__ method, you're saying it can be used as an integer. It's not surprising to me that hex(), oct(), and bin() would work with a Grade.F object. If anything, I'd say that more places should use __index__ than currently do.

ethanfurman · 2013-12-16T15:42:10Z

Victor Stinner commented:
-------------------------

I never understood the difference between "long" (int method)
and "index" (index method). Is the difference on the behaviour
of floating point numbers?

__index__ was originally added so that non-int integers, such as NumPy's int16, int32, etc, integer types could be used as indices and slices.

Now it means "if your type can produce a lossless integer, use __index__", which is why float and similar types don't define it.

The current meaning is unfortunate in that it is possible to want a type that can be used as an index or slice but that is still not a number, and in fact won't be used as a number in any scenario _except_ bin(), hex(), and oct().

It seems to me that by having those three functions check that the argument is a number, and bailing if it is not, is a decent way to ensure consistency.

One question I do have, since I don't have NumPy installed, is what happens with:

--> "NumPy's int's work here? %x" % uint16(7)

vstinner · 2013-12-16T15:46:59Z

$ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42) 
>>> import numpy
>>> hex(numpy.uint16(257))
'0x101'
>>> "%x" % numpy.uint16(257)
'101'
>>> x=numpy.uint16(257)
>>> x.__int__()
257
>>> x.__index__()
257

skrah · 2013-12-16T15:49:00Z

Ethan Furman <report@bugs.python.org> wrote:

The current meaning is unfortunate in that it is possible to want a type that
can be used as an index or slice but that is still not a number, and in fact
won't be used as a number in any scenario _except_ bin(), hex(), and oct().

memoryview, struct and probably also array.array accept __index__.

ethanfurman · 2013-12-16T16:01:00Z

Did I mention __index__ is an unfortunate name for the current trend for this method?

Stefan Krah commented:
----------------------

memoryview, struct and probably also array.array accept __index__.

When you say "accept __index__" do you mean for use as indices, or for use as values in the data structure itself?

skrah · 2013-12-16T16:17:58Z

Did I mention __index__ is an unfortunate name for the current trend for this method?

Yes, but it's probably too late to change that now. Also, a fully precise
name would be something like:

__to_int_exact_iff_object_has_integer_nature__ :)

When you say "accept __index__" do you mean for use as indices, or for use as
values in the data structure itself?

The latter, see Lib/test/test_buffer.py:2489 .

ethanfurman · 2013-12-16T16:52:23Z

Hmmm...

Well, much as I hate to say it, it's sounding like the correct solution here is to have %o and %x work when __index__ is available, instead of the other way around. :(

.format is not an issue because one must specify one's own if inheriting from object.

So the complete list of spcecifiers then is d, i, o, u, U, and c [1], and they should work if __index__ works.

Are we in agreement?

[1] http://docs.python.org/dev/library/stdtypes.html#printf-style-string-formatting

vstinner · 2013-12-16T17:11:41Z

Are we in agreement?

Start maybe on writing unit tests :-)

IMO all int-like objects should behave the same. I don't see any good
reason why hex(value) would succeed whereas "%x" % value fails. Both
should succeed (or both should fail).

serhiy-storchaka · 2013-12-16T17:16:13Z

> Did I mention __index__ is an unfortunate name for the current trend for
> this method?
Yes, but it's probably too late to change that now. Also, a fully precise
name would be something like:

__to_int_exact_iff_object_has_integer_nature__ :)

Perhaps in future (may be in 4.0) __index__ should be renamed to __int__ and
__int__ to __trunc__.

ericvsmith · 2013-12-16T17:58:05Z

Yes, I think adding __index__ to d, i, o, u, U, and c is the correct thing to do here.

gvanrossum · 2013-12-16T18:18:04Z

Not so fast. Currently, even in Python 3, '%d' % 3.14 returns '3'. "Fixing" this will likely break a huge amount of code.

gvanrossum · 2013-12-16T18:22:21Z

Also (the tracker email interface swallowed this):

it is possible to want a type that can be used as an index or slice but that is still not a number

I'm sorry, but this requirement is absurd. An index *is* a number. You
have to make up your mind. (I know, in the context of the example that
started this, this is funny, but I still stand by it.)

---

Finally, the correct name should perhaps have been __integer__ but I don't see enough reason to change it now.

ericvsmith · 2013-12-16T18:34:17Z

If you were going to make this change, I'd think you'd have to look for __index__ and then __int__. But I'll admit I haven't thought through all of the ramifications. It would be interesting to see what tests would break.

ethanfurman · 2013-12-16T20:20:57Z

Eric V. Smith commented:
------------------------

If you were going to make this change, I'd think you'd have to look
for __index__ and then __int__.

Does the order matter? Are there any types (and should there be) that would have both and return different answers for each?

If not, pick an order, try one and, if that one fails, try the other.

pitrou · 2013-12-16T20:24:41Z

I'm with Guido: it doesn't really make sense to allow __index__ but not __int__ on a type. So trying __index__ in str.format() sounds like a distraction.

ethanfurman · 2013-12-16T20:48:46Z

Antoine Pitrou opined:
----------------------

I'm with Guido: it doesn't really make sense to allow __index__ but not __int__ on
a type. So trying __index__ in str.format() sounds like a distraction.

  --> hex(3.14)  # calls __index__
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: 'float' object cannot be interpreted as an integer

--> '%x' % 3.14 # calls __int__
'3'

One of those behaviours is wrong. Which?

pitrou · 2013-12-16T20:50:19Z

--> '%x' % 3.14 # calls __int__
'3'

One of those behaviours is wrong. Which?

For '%x' and '%o', it probably doesn't make sense to convert the float
to an int.
(however, it does make sense for other formatters, such as '%d')

ethanfurman · 2013-12-16T21:14:35Z

Ethan Furman previously stated:
-------------------------------

So the complete list of spcecifiers then is d, i, o, u, U, and c [1], and they
should work if __index__ works.

Okay, so 'd' then should be considered a conversion operation, whilst the others should only work if the object is actually an integer type (which is implied by specifying __index__).

In other words

if %d or %u is specified, try __int__, then __index__
(according to the docs, u is obsolete and identical to d)
if %i, %o, %x, %X, or %c is specified, try only __index__

Agreed?

pitrou · 2013-12-16T21:26:27Z

In other words

if %d or %u is specified, try __int__, then __index__
(according to the docs, u is obsolete and identical to d)

Again, I don't think trying __index__ is useful.

if %i, %o, %x, %X, or %c is specified, try only __index__

I think everything yielding a decimal output should work with floats
(i.e. %i too).

ethanfurman · 2013-12-16T22:11:07Z

Antoine, if I understand you correctly, you are saying that any type that defines __index__ is an integer, and should therefore also define __int__, in which case Python can just use __int__ and not worry about __index__?

Here's the problem with that:

--> '%x' % 3.14
'3'

While I am beginning to agree that an integer type needs to implement both __int__ and __index__, it still remains true that Python needs to call __index__ if what it needs is already a real, true int, and not just something that can be truncated or otherwise converted into an int -- such as float.

pitrou · 2013-12-16T22:15:20Z

Antoine, if I understand you correctly, you are saying that any type
that defines __index__ is an integer, and should therefore also define
__int__, in which case Python can just use __int__ and not worry about
__index__?

... is an integer-like, yes.

While I am beginning to agree that an integer type needs to implement
both __int__ and __index__, it still remains true that Python needs to
call __index__ if what it needs is already a real, true int, and not
just something that can be truncated or otherwise converted into an
int -- such as float.

Of course.

ethanfurman · 2013-12-16T22:23:42Z

Antoine,

Does that mean you are reducing your previous statement of

So trying __index__ in str.format() sounds like a distraction.

to "using __index__ for %d, %i, and %u is not correct, but is correct for %c, %o, %x, and %X" ?

pitrou · 2013-12-16T22:24:33Z

Antoine,

Does that mean you are reducing your previous statement of

> So trying __index__ in str.format() sounds like a distraction.

to "using __index__ for %d, %i, and %u is not correct, but is correct
for %c, %o, %x, and %X" ?

Ah, yes, sorry for the confusion :)

serhiy-storchaka · 2013-12-16T22:28:05Z

In addition, PyLong_AsLong() calls __int__, while PyLong_AsUnsignedLong() doesn't call __int__.

ethanfurman · 2013-12-17T07:09:59Z

Thank you, Victor and Serhiy, for your pointers into the code.

I'm hoping we have general agreement about %c, %o, %x, and %X and having them use __index__ only (using __int__ would open the door to float conversions).

I still have a question about %i, though. The docs say %u is exactly the same as %d and is therefore deprecated. The docs do not say thay %i is the same as %d, but the descriptions are the same.

Are %i and %d the same, or is there some difference?

gvanrossum · 2013-12-17T15:42:47Z

AFAIK %i and %d are the same.

ericvsmith · 2013-12-17T15:47:33Z

Yes, looking through Objects/unicodeobject.c, 'u', 'i', and 'd' are treated the same everywhere.

ethanfurman · 2013-12-30T21:57:05Z

I can live with the deprecation route. I'll create a patch today or tomorrow for that.

larryhastings · 2013-12-31T01:13:51Z

I wouldn't call this a new feature--it's definitely a bug fix. So the "feature freeze" rule does not automatically apply. I definitely wouldn't permit this once we reach release candidates, but we aren't there yet.

I get the impression that it will break code, and it does seem kind of late. So I'm not saying yes or no yet. Let me think about it some more.

larryhastings · 2014-01-04T18:44:54Z

I'm willing to risk it in 3.4. Can you check it in in the next twelve hours?

(Sorry for the short notice, it slipped my mind until just now.)

python-dev · 2014-01-05T14:50:39Z

New changeset 2f81f0e331f6 by Ethan Furman in branch 'default':
bpo-19995: %o, %x, %X now only accept ints
http://hg.python.org/cpython/rev/2f81f0e331f6

ethanfurman · 2014-01-05T14:54:59Z

I was travelling yesterday and wasn't sure about the time stamp. Looks like I missed the 12-hour cutoff. Let me know if I should backout the commit.

pitrou · 2014-01-05T15:13:03Z

Ethan, I thought we were going the deprecation route?
This *will* break code.

pitrou · 2014-01-05T15:13:57Z

(also, there are typos: "shuld", "psuedo")

larryhastings · 2014-01-05T15:20:54Z

Antoine: I made the call to bite the bullet and fix it. If that's a terrible idea we can change it before RC1. But from my (admittedly dim) understanding of the issue, we're going to have to fix this sooner or later, and sooner is probably better.

If by permitting this I've made a terrible mistake, could you do me a favor and summarize the issue and the alternatives so I can understand it?

larryhastings · 2014-01-05T15:37:03Z

And, yes, Ethan's checkin missed the cutoff for beta 2.

ethanfurman · 2014-01-05T15:38:03Z

I think the code-breakage issue is that although this is a bug now, it did not use to be a bug. (Or maybe it was always a bug, but nobody noticed -- I don't know when hex() and oct() were introduced.)

Basically, having %o and %x work with floats was the intended behavior, as there were tests to make sure it worked.

But hex() and oct() were designed to only work with integer types, resulting in two ways to get an octal or hexadecimal string, one of which worked with non-ints, and one which didn't.

tarfile.py, for example, was using %o to convert a stat member (a timestamp?) from int/float to octal.

It is agreed that working with non-ints is no longer the intended behavior, but there is undoubtedly code that uses it -- especially in cases where the return value could be either an int or a float.

bitdancer · 2014-01-05T17:24:36Z

Given that even the stdlib used it, there is no question in my mind that a deprecation period is required. We don't just arbitrarily break peoples code on purpose, we give them at least a release in which to fix it first.

pitrou · 2014-01-05T17:30:02Z

Keep in mind that with new-style division, it's easy to get a float even if all your inputs are ints.

ethanfurman · 2014-01-05T18:15:13Z

I'll switch it to a deprecation warning instead sometime this week.

The changes to datamodel.rst and tarfile.py should stay, I think.

test_format and test_unicode will both verify current* behavior and check for the deprecation warning, and unicodeobject.c will generate the warning but keep current behavior.

Once 3.5 is tagged I'll put the updates to these three files there.

By 'current' I mean accepting non-ints.

terryjreedy · 2014-01-05T19:13:08Z

Merely from reading the docs, including the hex() and {}.format sections, I consider the current
'%x' % 1.5 == '1'
a bug in that I would expect either a Type or Value Error
or a hex fraction(!) ('1,8') matching the core of the output of float.hex(1.5). But given the history, I agree that a deprecation notice is needed first.

In 3.3 section 6.1.3.1. Format Specification Mini-Language, integer and float conversions are in separate tables, so no confusion is possible. In 3.3 section 4.7.2. printf-style String Formatting there is just one table with all conversion chars. I think the table would be clearer if it were somehow split into three sections: perhaps 3 otherwise blank rows containing 'integer', 'float', and 'other' in the Conversion column. (I don't know if this works for .rst tables.)

I think I would change the table for 3.4 with a note 8 that 'o', 'x', and 'X' currently accept floats by calling int() on the value, but that this implicit behavior is deprecated, so one should be explicit. The DeprecationWarning should also give the same remedy.

ethanfurman · 2014-01-10T04:40:47Z

Could somebody explain this?
===============================================================================

ethan@media:~/source/python/issue19995_depr$ ./python -W error
Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
depracation is fatal
oxX
no __index__
depracation is fatal
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
DeprecationWarning: automatic int conversions have been deprecated
>>> 
===============================================================================
ethan@media:~/source/python/issue19995_depr$ ./python -W message
Invalid -W option ignored: invalid action: 'message'
Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
depracation not fatal, attempting __int__ conversion
conversion with __int__ successful
'3'
>>> 
===============================================================================
ethan@media:~/source/python/issue19995_depr$ ./python -W once
Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
sys:1: DeprecationWarning: automatic int conversions have been deprecated
depracation not fatal, attempting __int__ conversion
conversion with __int__ successful
'3'
>>>

===============================================================================

As you can see, with the -W error option it seems to go through the routine twice; does that mean the the '1' in 'line 1' is being specified as a float?

ethanfurman · 2014-01-11T04:18:29Z

For anyone paying really close attention, I've already switched the assertEquals to assertEqual. ;)

python-dev · 2014-01-12T07:21:03Z

New changeset 22e55bd5583c by Ethan Furman in branch 'default':
bpo-19995: issue deprecation warning for non-integer values to %c, %o, %x, %X
http://hg.python.org/cpython/rev/22e55bd5583c

vstinner · 2014-01-12T09:48:36Z

Typo in the C code: depracation :-)

python-dev · 2014-01-12T16:42:30Z

New changeset cc8b21988efb by Ethan Furman in branch 'default':
bpo-19995: fixed typo; switched from test.support.check_warnings to assertWarns
http://hg.python.org/cpython/rev/cc8b21988efb

ethanfurman · 2014-01-14T15:35:04Z

Depracation warning is in place for 3.4.

When 3.5 is tagged I'll switch it an error.

python-dev · 2014-02-06T20:57:15Z

New changeset 775fb736b4b8 by Serhiy Storchaka in branch 'default':
Catch deprecation warnings emitted when non-integers are formatted with %c, %o
http://hg.python.org/cpython/rev/775fb736b4b8

python-dev · 2014-03-19T15:39:08Z

New changeset 9120196b3114 by Ethan Furman in branch 'default':
bpo-19995: passing a non-int to %o, %c, %x, or %X now raises an exception
http://hg.python.org/cpython/rev/9120196b3114

skrah · 2014-03-19T23:38:34Z

It looks like there's typo: Psuedo => Pseudo. (Unless the first one
is an alternate spelling I'm not aware of.)

ethanfurman · 2014-03-19T23:55:24Z

No, apparently I am incapable of spelling pseudo correctly. I'll fix that along with the better error message:

%x format: an integer is required, not float

(variable, obviously ;)

python-dev · 2014-03-21T13:39:07Z

New changeset e266525c9294 by Ethan Furman in branch 'default':
bpo-19995: more informative error message; spelling corrections; use operator.mod instead of __mod__
http://hg.python.org/cpython/rev/e266525c9294

ethanfurman · 2014-03-21T20:48:11Z

Final status:

3.4 -> DeprecationWarning

3.5 -> TypeError

warsaw · 2015-06-22T19:42:54Z

Found my first 3.5 breakage which I think is due to this.

>> from uuid import uuid4
>> '%.32x' % uuid4()

warsaw · 2015-06-22T19:46:28Z

Fix:

>> '%.32x' % uuid4().int

ethanfurman self-assigned this Dec 16, 2013

ethanfurman closed this as completed Mar 21, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

%c, %o, %x, %X accept non-integer values instead of raising an exception #64194

%c, %o, %x, %X accept non-integer values instead of raising an exception #64194

Comments

ethanfurman commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

vstinner commented Dec 16, 2013

ericvsmith commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

vstinner commented Dec 16, 2013

skrah mannequin commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

skrah mannequin commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

vstinner commented Dec 16, 2013

serhiy-storchaka commented Dec 16, 2013

ericvsmith commented Dec 16, 2013

gvanrossum commented Dec 16, 2013

gvanrossum commented Dec 16, 2013

ericvsmith commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

pitrou commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

pitrou commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

pitrou commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

pitrou commented Dec 16, 2013

ethanfurman commented Dec 16, 2013

pitrou commented Dec 16, 2013

serhiy-storchaka commented Dec 16, 2013

ethanfurman commented Dec 17, 2013

gvanrossum commented Dec 17, 2013

ericvsmith commented Dec 17, 2013

ethanfurman commented Dec 30, 2013

larryhastings commented Dec 31, 2013

larryhastings commented Jan 4, 2014

python-dev mannequin commented Jan 5, 2014

ethanfurman commented Jan 5, 2014

pitrou commented Jan 5, 2014

pitrou commented Jan 5, 2014

larryhastings commented Jan 5, 2014

larryhastings commented Jan 5, 2014

ethanfurman commented Jan 5, 2014

bitdancer commented Jan 5, 2014

pitrou commented Jan 5, 2014

ethanfurman commented Jan 5, 2014

terryjreedy commented Jan 5, 2014

ethanfurman commented Jan 10, 2014

ethanfurman commented Jan 11, 2014

python-dev mannequin commented Jan 12, 2014

vstinner commented Jan 12, 2014

python-dev mannequin commented Jan 12, 2014

ethanfurman commented Jan 14, 2014

python-dev mannequin commented Feb 6, 2014

python-dev mannequin commented Mar 19, 2014

skrah mannequin commented Mar 19, 2014

ethanfurman commented Mar 19, 2014

python-dev mannequin commented Mar 21, 2014

ethanfurman commented Mar 21, 2014

warsaw commented Jun 22, 2015

warsaw commented Jun 22, 2015