-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
%c, %o, %x, %X accept non-integer values instead of raising an exception #64194
Comments
Using Enum to illustrate:
--> '%x' % Grade.F
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %x format: a number is required, not Grade
I suggest that hex() and oct() have the same check that %x and %o do so that non-numbers are not representable as hex and octal. While we're at it, we should do the same for bin(). Are there any others? I'll create a patch once we have a decision on which way to solve this issue. |
Calls:
I never understood the difference between "long" (int method) and "index" (index method). Is the difference on the behaviour of floating point numbers? |
It seems to me that by giving it an __index__ method, you're saying it can be used as an integer. It's not surprising to me that hex(), oct(), and bin() would work with a Grade.F object. If anything, I'd say that more places should use __index__ than currently do. |
Victor Stinner commented:
__index__ was originally added so that non-int integers, such as NumPy's int16, int32, etc, integer types could be used as indices and slices. Now it means "if your type can produce a lossless integer, use __index__", which is why float and similar types don't define it. The current meaning is unfortunate in that it is possible to want a type that can be used as an index or slice but that is still not a number, and in fact won't be used as a number in any scenario _except_ bin(), hex(), and oct(). It seems to me that by having those three functions check that the argument is a number, and bailing if it is not, is a decent way to ensure consistency. One question I do have, since I don't have NumPy installed, is what happens with: --> "NumPy's int's work here? %x" % uint16(7) |
$ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42)
>>> import numpy
>>> hex(numpy.uint16(257))
'0x101'
>>> "%x" % numpy.uint16(257)
'101'
>>> x=numpy.uint16(257)
>>> x.__int__()
257
>>> x.__index__()
257 |
Ethan Furman <report@bugs.python.org> wrote:
memoryview, struct and probably also array.array accept __index__. |
Did I mention __index__ is an unfortunate name for the current trend for this method? Stefan Krah commented:
When you say "accept __index__" do you mean for use as indices, or for use as values in the data structure itself? |
Yes, but it's probably too late to change that now. Also, a fully precise __to_int_exact_iff_object_has_integer_nature__ :)
The latter, see Lib/test/test_buffer.py:2489 . |
Hmmm... Well, much as I hate to say it, it's sounding like the correct solution here is to have %o and %x work when __index__ is available, instead of the other way around. :( .format is not an issue because one must specify one's own if inheriting from object. So the complete list of spcecifiers then is d, i, o, u, U, and c [1], and they should work if __index__ works. Are we in agreement? [1] http://docs.python.org/dev/library/stdtypes.html#printf-style-string-formatting |
Start maybe on writing unit tests :-) IMO all int-like objects should behave the same. I don't see any good |
Perhaps in future (may be in 4.0) __index__ should be renamed to __int__ and |
Yes, I think adding __index__ to d, i, o, u, U, and c is the correct thing to do here. |
Not so fast. Currently, even in Python 3, '%d' % 3.14 returns '3'. "Fixing" this will likely break a huge amount of code. |
Also (the tracker email interface swallowed this):
I'm sorry, but this requirement is absurd. An index *is* a number. You --- Finally, the correct name should perhaps have been __integer__ but I don't see enough reason to change it now. |
If you were going to make this change, I'd think you'd have to look for __index__ and then __int__. But I'll admit I haven't thought through all of the ramifications. It would be interesting to see what tests would break. |
Eric V. Smith commented:
Does the order matter? Are there any types (and should there be) that would have both and return different answers for each? If not, pick an order, try one and, if that one fails, try the other. |
I'm with Guido: it doesn't really make sense to allow __index__ but not __int__ on a type. So trying __index__ in str.format() sounds like a distraction. |
Antoine Pitrou opined:
--> hex(3.14) # calls __index__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer --> '%x' % 3.14 # calls __int__ One of those behaviours is wrong. Which? |
For '%x' and '%o', it probably doesn't make sense to convert the float |
Ethan Furman previously stated:
Okay, so 'd' then should be considered a conversion operation, whilst the others should only work if the object is actually an integer type (which is implied by specifying __index__). In other words
Agreed? |
Again, I don't think trying __index__ is useful.
I think everything yielding a decimal output should work with floats |
Antoine, if I understand you correctly, you are saying that any type that defines __index__ is an integer, and should therefore also define __int__, in which case Python can just use __int__ and not worry about __index__? Here's the problem with that: --> '%x' % 3.14 While I am beginning to agree that an integer type needs to implement both __int__ and __index__, it still remains true that Python needs to call __index__ if what it needs is already a real, true int, and not just something that can be truncated or otherwise converted into an int -- such as float. |
... is an integer-like, yes.
Of course. |
Antoine, Does that mean you are reducing your previous statement of
to "using __index__ for %d, %i, and %u is not correct, but is correct for %c, %o, %x, and %X" ? |
Ah, yes, sorry for the confusion :) |
In addition, PyLong_AsLong() calls __int__, while PyLong_AsUnsignedLong() doesn't call __int__. |
Thank you, Victor and Serhiy, for your pointers into the code. I'm hoping we have general agreement about %c, %o, %x, and %X and having them use __index__ only (using __int__ would open the door to float conversions). I still have a question about %i, though. The docs say %u is exactly the same as %d and is therefore deprecated. The docs do not say thay %i is the same as %d, but the descriptions are the same. Are %i and %d the same, or is there some difference? |
AFAIK %i and %d are the same. |
Yes, looking through Objects/unicodeobject.c, 'u', 'i', and 'd' are treated the same everywhere. |
I can live with the deprecation route. I'll create a patch today or tomorrow for that. |
I wouldn't call this a new feature--it's definitely a bug fix. So the "feature freeze" rule does not automatically apply. I definitely wouldn't permit this once we reach release candidates, but we aren't there yet. I get the impression that it will break code, and it does seem kind of late. So I'm not saying yes or no yet. Let me think about it some more. |
I'm willing to risk it in 3.4. Can you check it in in the next twelve hours? (Sorry for the short notice, it slipped my mind until just now.) |
New changeset 2f81f0e331f6 by Ethan Furman in branch 'default': |
I was travelling yesterday and wasn't sure about the time stamp. Looks like I missed the 12-hour cutoff. Let me know if I should backout the commit. |
Ethan, I thought we were going the deprecation route? |
(also, there are typos: "shuld", "psuedo") |
Antoine: I made the call to bite the bullet and fix it. If that's a terrible idea we can change it before RC1. But from my (admittedly dim) understanding of the issue, we're going to have to fix this sooner or later, and sooner is probably better. If by permitting this I've made a terrible mistake, could you do me a favor and summarize the issue and the alternatives so I can understand it? |
And, yes, Ethan's checkin missed the cutoff for beta 2. |
I think the code-breakage issue is that although this is a bug now, it did not use to be a bug. (Or maybe it was always a bug, but nobody noticed -- I don't know when hex() and oct() were introduced.) Basically, having %o and %x work with floats was the intended behavior, as there were tests to make sure it worked. But hex() and oct() were designed to only work with integer types, resulting in two ways to get an octal or hexadecimal string, one of which worked with non-ints, and one which didn't. tarfile.py, for example, was using %o to convert a stat member (a timestamp?) from int/float to octal. It is agreed that working with non-ints is no longer the intended behavior, but there is undoubtedly code that uses it -- especially in cases where the return value could be either an int or a float. |
Given that even the stdlib used it, there is no question in my mind that a deprecation period is required. We don't just arbitrarily break peoples code on purpose, we give them at least a release in which to fix it first. |
Keep in mind that with new-style division, it's easy to get a float even if all your inputs are ints. |
I'll switch it to a deprecation warning instead sometime this week. The changes to datamodel.rst and tarfile.py should stay, I think. test_format and test_unicode will both verify current* behavior and check for the deprecation warning, and unicodeobject.c will generate the warning but keep current behavior. Once 3.5 is tagged I'll put the updates to these three files there.
|
Merely from reading the docs, including the hex() and {}.format sections, I consider the current In 3.3 section 6.1.3.1. Format Specification Mini-Language, integer and float conversions are in separate tables, so no confusion is possible. In 3.3 section 4.7.2. printf-style String Formatting there is just one table with all conversion chars. I think the table would be clearer if it were somehow split into three sections: perhaps 3 otherwise blank rows containing 'integer', 'float', and 'other' in the Conversion column. (I don't know if this works for .rst tables.) I think I would change the table for 3.4 with a note 8 that 'o', 'x', and 'X' currently accept floats by calling int() on the value, but that this implicit behavior is deprecated, so one should be explicit. The DeprecationWarning should also give the same remedy. |
Could somebody explain this? ethan@media:~/source/python/issue19995_depr$ ./python -W error
Python 3.4.0b1 (default:2f81f0e331f6+, Jan 9 2014, 20:30:18)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
depracation is fatal
oxX
no __index__
depracation is fatal
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
DeprecationWarning: automatic int conversions have been deprecated
>>>
===============================================================================
ethan@media:~/source/python/issue19995_depr$ ./python -W message
Invalid -W option ignored: invalid action: 'message'
Python 3.4.0b1 (default:2f81f0e331f6+, Jan 9 2014, 20:30:18)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
depracation not fatal, attempting __int__ conversion
conversion with __int__ successful
'3'
>>>
===============================================================================
ethan@media:~/source/python/issue19995_depr$ ./python -W once
Python 3.4.0b1 (default:2f81f0e331f6+, Jan 9 2014, 20:30:18)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '%x' % 3.14
oxX
no __index__
sys:1: DeprecationWarning: automatic int conversions have been deprecated
depracation not fatal, attempting __int__ conversion
conversion with __int__ successful
'3'
>>> =============================================================================== As you can see, with the -W error option it seems to go through the routine twice; does that mean the the '1' in 'line 1' is being specified as a float? |
For anyone paying really close attention, I've already switched the assertEquals to assertEqual. ;) |
New changeset 22e55bd5583c by Ethan Furman in branch 'default': |
Typo in the C code: depracation :-) |
New changeset cc8b21988efb by Ethan Furman in branch 'default': |
Depracation warning is in place for 3.4. When 3.5 is tagged I'll switch it an error. |
New changeset 775fb736b4b8 by Serhiy Storchaka in branch 'default': |
New changeset 9120196b3114 by Ethan Furman in branch 'default': |
It looks like there's typo: Psuedo => Pseudo. (Unless the first one |
No, apparently I am incapable of spelling pseudo correctly. I'll fix that along with the better error message: %x format: an integer is required, not float (variable, obviously ;) |
New changeset e266525c9294 by Ethan Furman in branch 'default': |
Final status: 3.4 -> DeprecationWarning 3.5 -> TypeError |
Found my first 3.5 breakage which I think is due to this.
|
Fix:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: