Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%c, %o, %x, %X accept non-integer values instead of raising an exception #64194

Closed
ethanfurman opened this issue Dec 16, 2013 · 66 comments
Closed
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@ethanfurman
Copy link
Member

BPO 19995
Nosy @gvanrossum, @warsaw, @rhettinger, @terryjreedy, @mdickinson, @pitrou, @vstinner, @larryhastings, @ericvsmith, @ezio-melotti, @bitdancer, @skrah, @ethanfurman, @serhiy-storchaka, @MojoVampire
Files
  • issue19995.stoneleaf.01.patch
  • issue19995.stoneleaf.02.patch
  • issue19995.stoneleaf.03.patch
  • issue19995.stoneleaf.04.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2014-03-21.20:48:10.612>
    created_at = <Date 2013-12-16.10:29:04.037>
    labels = ['type-bug']
    title = '%c, %o, %x, %X accept non-integer values instead of raising an exception'
    updated_at = <Date 2015-06-22.19:46:27.672>
    user = 'https://github.com/ethanfurman'

    bugs.python.org fields:

    activity = <Date 2015-06-22.19:46:27.672>
    actor = 'barry'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2014-03-21.20:48:10.612>
    closer = 'ethan.furman'
    components = []
    creation = <Date 2013-12-16.10:29:04.037>
    creator = 'ethan.furman'
    dependencies = []
    files = ['33284', '33286', '33289', '33414']
    hgrepos = []
    issue_num = 19995
    keywords = ['patch']
    message_count = 66.0
    messages = ['206290', '206293', '206303', '206321', '206322', '206323', '206325', '206326', '206331', '206332', '206333', '206336', '206338', '206339', '206341', '206363', '206364', '206367', '206368', '206369', '206371', '206378', '206379', '206381', '206382', '206383', '206409', '206465', '206467', '206715', '206858', '207049', '207069', '207098', '207099', '207100', '207101', '207114', '207115', '207117', '207303', '207378', '207379', '207380', '207382', '207383', '207384', '207385', '207386', '207387', '207389', '207393', '207835', '207895', '207929', '207940', '207966', '208099', '210417', '214104', '214149', '214155', '214365', '214411', '245643', '245644']
    nosy_count = 17.0
    nosy_names = ['gvanrossum', 'barry', 'rhettinger', 'terry.reedy', 'mark.dickinson', 'pitrou', 'vstinner', 'larry', 'eric.smith', 'ezio.melotti', 'Arfrever', 'r.david.murray', 'skrah', 'ethan.furman', 'python-dev', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19995'
    versions = ['Python 3.4', 'Python 3.5']

    @ethanfurman
    Copy link
    Member Author

    Using Enum to illustrate:

    --> class Grade(enum.Enum):
    ...   A = 4
    ...   B = 3
    ...   C = 2
    ...   D = 1
    ...   F = 0
    ...   def __index__(self):
    ...     return self._value_
    
    --> ['miserable'][Grade.F]
    'miserable'
    
        --> '%x' % Grade.F
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        TypeError: %x format: a number is required, not Grade
    --> hex(Grade.F)
    '0x0'
    

    I suggest that hex() and oct() have the same check that %x and %o do so that non-numbers are not representable as hex and octal.

    While we're at it, we should do the same for bin(). Are there any others?

    I'll create a patch once we have a decision on which way to solve this issue.

    @ethanfurman ethanfurman self-assigned this Dec 16, 2013
    @vstinner
    Copy link
    Member

    Calls:

    • hex()/oct() => PyNumber_ToBase() => PyNumber_Index().
    • PyUnicode_Format() => mainformatlong() => PyNumber_Long()

    I never understood the difference between "long" (int method) and "index" (index method). Is the difference on the behaviour of floating point numbers?

    @ericvsmith
    Copy link
    Member

    It seems to me that by giving it an __index__ method, you're saying it can be used as an integer. It's not surprising to me that hex(), oct(), and bin() would work with a Grade.F object. If anything, I'd say that more places should use __index__ than currently do.

    @ethanfurman
    Copy link
    Member Author

    Victor Stinner commented:
    -------------------------

    I never understood the difference between "long" (int method)
    and "index" (index method). Is the difference on the behaviour
    of floating point numbers?

    __index__ was originally added so that non-int integers, such as NumPy's int16, int32, etc, integer types could be used as indices and slices.

    Now it means "if your type can produce a lossless integer, use __index__", which is why float and similar types don't define it.

    The current meaning is unfortunate in that it is possible to want a type that can be used as an index or slice but that is still not a number, and in fact won't be used as a number in any scenario _except_ bin(), hex(), and oct().

    It seems to me that by having those three functions check that the argument is a number, and bailing if it is not, is a decent way to ensure consistency.

    One question I do have, since I don't have NumPy installed, is what happens with:

    --> "NumPy's int's work here? %x" % uint16(7)

    @vstinner
    Copy link
    Member

    $ python
    Python 2.7.5 (default, Nov 12 2013, 16:18:42) 
    >>> import numpy
    >>> hex(numpy.uint16(257))
    '0x101'
    >>> "%x" % numpy.uint16(257)
    '101'
    >>> x=numpy.uint16(257)
    >>> x.__int__()
    257
    >>> x.__index__()
    257

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 16, 2013

    Ethan Furman <report@bugs.python.org> wrote:

    The current meaning is unfortunate in that it is possible to want a type that
    can be used as an index or slice but that is still not a number, and in fact
    won't be used as a number in any scenario _except_ bin(), hex(), and oct().

    memoryview, struct and probably also array.array accept __index__.

    @ethanfurman
    Copy link
    Member Author

    Did I mention __index__ is an unfortunate name for the current trend for this method?

    Stefan Krah commented:
    ----------------------

    memoryview, struct and probably also array.array accept __index__.

    When you say "accept __index__" do you mean for use as indices, or for use as values in the data structure itself?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 16, 2013

    Did I mention __index__ is an unfortunate name for the current trend for this method?

    Yes, but it's probably too late to change that now. Also, a fully precise
    name would be something like:

    __to_int_exact_iff_object_has_integer_nature__ :)

    When you say "accept __index__" do you mean for use as indices, or for use as
    values in the data structure itself?

    The latter, see Lib/test/test_buffer.py:2489 .

    @ethanfurman
    Copy link
    Member Author

    Hmmm...

    Well, much as I hate to say it, it's sounding like the correct solution here is to have %o and %x work when __index__ is available, instead of the other way around. :(

    .format is not an issue because one must specify one's own if inheriting from object.

    So the complete list of spcecifiers then is d, i, o, u, U, and c [1], and they should work if __index__ works.

    Are we in agreement?

    [1] http://docs.python.org/dev/library/stdtypes.html#printf-style-string-formatting

    @vstinner
    Copy link
    Member

    Are we in agreement?

    Start maybe on writing unit tests :-)

    IMO all int-like objects should behave the same. I don't see any good
    reason why hex(value) would succeed whereas "%x" % value fails. Both
    should succeed (or both should fail).

    @serhiy-storchaka
    Copy link
    Member

    > Did I mention __index__ is an unfortunate name for the current trend for
    > this method?
    Yes, but it's probably too late to change that now. Also, a fully precise
    name would be something like:

    __to_int_exact_iff_object_has_integer_nature__ :)

    Perhaps in future (may be in 4.0) __index__ should be renamed to __int__ and
    __int__ to __trunc__.

    @ericvsmith
    Copy link
    Member

    Yes, I think adding __index__ to d, i, o, u, U, and c is the correct thing to do here.

    @gvanrossum
    Copy link
    Member

    Not so fast. Currently, even in Python 3, '%d' % 3.14 returns '3'. "Fixing" this will likely break a huge amount of code.

    @gvanrossum
    Copy link
    Member

    Also (the tracker email interface swallowed this):

    it is possible to want a type that can be used as an index or slice but that is still not a number

    I'm sorry, but this requirement is absurd. An index *is* a number. You
    have to make up your mind. (I know, in the context of the example that
    started this, this is funny, but I still stand by it.)

    ---

    Finally, the correct name should perhaps have been __integer__ but I don't see enough reason to change it now.

    @ericvsmith
    Copy link
    Member

    If you were going to make this change, I'd think you'd have to look for __index__ and then __int__. But I'll admit I haven't thought through all of the ramifications. It would be interesting to see what tests would break.

    @ethanfurman
    Copy link
    Member Author

    Eric V. Smith commented:
    ------------------------

    If you were going to make this change, I'd think you'd have to look
    for __index__ and then __int__.

    Does the order matter? Are there any types (and should there be) that would have both and return different answers for each?

    If not, pick an order, try one and, if that one fails, try the other.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2013

    I'm with Guido: it doesn't really make sense to allow __index__ but not __int__ on a type. So trying __index__ in str.format() sounds like a distraction.

    @ethanfurman
    Copy link
    Member Author

    Antoine Pitrou opined:
    ----------------------

    I'm with Guido: it doesn't really make sense to allow __index__ but not __int__ on
    a type. So trying __index__ in str.format() sounds like a distraction.

      --> hex(3.14)  # calls __index__
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'float' object cannot be interpreted as an integer

    --> '%x' % 3.14 # calls __int__
    '3'

    One of those behaviours is wrong. Which?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2013

    --> '%x' % 3.14 # calls __int__
    '3'

    One of those behaviours is wrong. Which?

    For '%x' and '%o', it probably doesn't make sense to convert the float
    to an int.
    (however, it does make sense for other formatters, such as '%d')

    @ethanfurman
    Copy link
    Member Author

    Ethan Furman previously stated:
    -------------------------------

    So the complete list of spcecifiers then is d, i, o, u, U, and c [1], and they
    should work if __index__ works.

    Okay, so 'd' then should be considered a conversion operation, whilst the others should only work if the object is actually an integer type (which is implied by specifying __index__).

    In other words

    • if %d or %u is specified, try __int__, then __index__
      (according to the docs, u is obsolete and identical to d)

    • if %i, %o, %x, %X, or %c is specified, try only __index__

    Agreed?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2013

    In other words

    • if %d or %u is specified, try __int__, then __index__
      (according to the docs, u is obsolete and identical to d)

    Again, I don't think trying __index__ is useful.

    • if %i, %o, %x, %X, or %c is specified, try only __index__

    I think everything yielding a decimal output should work with floats
    (i.e. %i too).

    @ethanfurman
    Copy link
    Member Author

    Antoine, if I understand you correctly, you are saying that any type that defines __index__ is an integer, and should therefore also define __int__, in which case Python can just use __int__ and not worry about __index__?

    Here's the problem with that:

    --> '%x' % 3.14
    '3'

    While I am beginning to agree that an integer type needs to implement both __int__ and __index__, it still remains true that Python needs to call __index__ if what it needs is already a real, true int, and not just something that can be truncated or otherwise converted into an int -- such as float.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2013

    Antoine, if I understand you correctly, you are saying that any type
    that defines __index__ is an integer, and should therefore also define
    __int__, in which case Python can just use __int__ and not worry about
    __index__?

    ... is an integer-like, yes.

    While I am beginning to agree that an integer type needs to implement
    both __int__ and __index__, it still remains true that Python needs to
    call __index__ if what it needs is already a real, true int, and not
    just something that can be truncated or otherwise converted into an
    int -- such as float.

    Of course.

    @ethanfurman
    Copy link
    Member Author

    Antoine,

    Does that mean you are reducing your previous statement of

    So trying __index__ in str.format() sounds like a distraction.

    to "using __index__ for %d, %i, and %u is not correct, but is correct for %c, %o, %x, and %X" ?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 16, 2013

    Antoine,

    Does that mean you are reducing your previous statement of

    > So trying __index__ in str.format() sounds like a distraction.

    to "using __index__ for %d, %i, and %u is not correct, but is correct
    for %c, %o, %x, and %X" ?

    Ah, yes, sorry for the confusion :)

    @serhiy-storchaka
    Copy link
    Member

    In addition, PyLong_AsLong() calls __int__, while PyLong_AsUnsignedLong() doesn't call __int__.

    @ethanfurman
    Copy link
    Member Author

    Thank you, Victor and Serhiy, for your pointers into the code.

    I'm hoping we have general agreement about %c, %o, %x, and %X and having them use __index__ only (using __int__ would open the door to float conversions).

    I still have a question about %i, though. The docs say %u is exactly the same as %d and is therefore deprecated. The docs do not say thay %i is the same as %d, but the descriptions are the same.

    Are %i and %d the same, or is there some difference?

    @gvanrossum
    Copy link
    Member

    AFAIK %i and %d are the same.

    @ericvsmith
    Copy link
    Member

    Yes, looking through Objects/unicodeobject.c, 'u', 'i', and 'd' are treated the same everywhere.

    @ethanfurman
    Copy link
    Member Author

    I can live with the deprecation route. I'll create a patch today or tomorrow for that.

    @larryhastings
    Copy link
    Contributor

    I wouldn't call this a new feature--it's definitely a bug fix. So the "feature freeze" rule does not automatically apply. I definitely wouldn't permit this once we reach release candidates, but we aren't there yet.

    I get the impression that it will break code, and it does seem kind of late. So I'm not saying yes or no yet. Let me think about it some more.

    @larryhastings
    Copy link
    Contributor

    I'm willing to risk it in 3.4. Can you check it in in the next twelve hours?

    (Sorry for the short notice, it slipped my mind until just now.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 5, 2014

    New changeset 2f81f0e331f6 by Ethan Furman in branch 'default':
    bpo-19995: %o, %x, %X now only accept ints
    http://hg.python.org/cpython/rev/2f81f0e331f6

    @ethanfurman
    Copy link
    Member Author

    I was travelling yesterday and wasn't sure about the time stamp. Looks like I missed the 12-hour cutoff. Let me know if I should backout the commit.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2014

    Ethan, I thought we were going the deprecation route?
    This *will* break code.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2014

    (also, there are typos: "shuld", "psuedo")

    @larryhastings
    Copy link
    Contributor

    Antoine: I made the call to bite the bullet and fix it. If that's a terrible idea we can change it before RC1. But from my (admittedly dim) understanding of the issue, we're going to have to fix this sooner or later, and sooner is probably better.

    If by permitting this I've made a terrible mistake, could you do me a favor and summarize the issue and the alternatives so I can understand it?

    @larryhastings
    Copy link
    Contributor

    And, yes, Ethan's checkin missed the cutoff for beta 2.

    @ethanfurman
    Copy link
    Member Author

    I think the code-breakage issue is that although this is a bug now, it did not use to be a bug. (Or maybe it was always a bug, but nobody noticed -- I don't know when hex() and oct() were introduced.)

    Basically, having %o and %x work with floats was the intended behavior, as there were tests to make sure it worked.

    But hex() and oct() were designed to only work with integer types, resulting in two ways to get an octal or hexadecimal string, one of which worked with non-ints, and one which didn't.

    tarfile.py, for example, was using %o to convert a stat member (a timestamp?) from int/float to octal.

    It is agreed that working with non-ints is no longer the intended behavior, but there is undoubtedly code that uses it -- especially in cases where the return value could be either an int or a float.

    @bitdancer
    Copy link
    Member

    Given that even the stdlib used it, there is no question in my mind that a deprecation period is required. We don't just arbitrarily break peoples code on purpose, we give them at least a release in which to fix it first.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2014

    Keep in mind that with new-style division, it's easy to get a float even if all your inputs are ints.

    @ethanfurman
    Copy link
    Member Author

    I'll switch it to a deprecation warning instead sometime this week.

    The changes to datamodel.rst and tarfile.py should stay, I think.

    test_format and test_unicode will both verify current* behavior and check for the deprecation warning, and unicodeobject.c will generate the warning but keep current behavior.

    Once 3.5 is tagged I'll put the updates to these three files there.

    • By 'current' I mean accepting non-ints.

    @terryjreedy
    Copy link
    Member

    Merely from reading the docs, including the hex() and {}.format sections, I consider the current
    '%x' % 1.5 == '1'
    a bug in that I would expect either a Type or Value Error
    or a hex fraction(!) ('1,8') matching the core of the output of float.hex(1.5). But given the history, I agree that a deprecation notice is needed first.

    In 3.3 section 6.1.3.1. Format Specification Mini-Language, integer and float conversions are in separate tables, so no confusion is possible. In 3.3 section 4.7.2. printf-style String Formatting there is just one table with all conversion chars. I think the table would be clearer if it were somehow split into three sections: perhaps 3 otherwise blank rows containing 'integer', 'float', and 'other' in the Conversion column. (I don't know if this works for .rst tables.)

    I think I would change the table for 3.4 with a note 8 that 'o', 'x', and 'X' currently accept floats by calling int() on the value, but that this implicit behavior is deprecated, so one should be explicit. The DeprecationWarning should also give the same remedy.

    @ethanfurman
    Copy link
    Member Author

    Could somebody explain this?
    ===============================================================================

    ethan@media:~/source/python/issue19995_depr$ ./python -W error
    Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
    [GCC 4.7.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> '%x' % 3.14
    oxX
    no __index__
    depracation is fatal
    oxX
    no __index__
    depracation is fatal
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    DeprecationWarning: automatic int conversions have been deprecated
    >>> 
    ===============================================================================
    ethan@media:~/source/python/issue19995_depr$ ./python -W message
    Invalid -W option ignored: invalid action: 'message'
    Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
    [GCC 4.7.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> '%x' % 3.14
    oxX
    no __index__
    depracation not fatal, attempting __int__ conversion
    conversion with __int__ successful
    '3'
    >>> 
    ===============================================================================
    ethan@media:~/source/python/issue19995_depr$ ./python -W once
    Python 3.4.0b1 (default:2f81f0e331f6+, Jan  9 2014, 20:30:18) 
    [GCC 4.7.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> '%x' % 3.14
    oxX
    no __index__
    sys:1: DeprecationWarning: automatic int conversions have been deprecated
    depracation not fatal, attempting __int__ conversion
    conversion with __int__ successful
    '3'
    >>> 

    ===============================================================================

    As you can see, with the -W error option it seems to go through the routine twice; does that mean the the '1' in 'line 1' is being specified as a float?

    @ethanfurman
    Copy link
    Member Author

    For anyone paying really close attention, I've already switched the assertEquals to assertEqual. ;)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 12, 2014

    New changeset 22e55bd5583c by Ethan Furman in branch 'default':
    bpo-19995: issue deprecation warning for non-integer values to %c, %o, %x, %X
    http://hg.python.org/cpython/rev/22e55bd5583c

    @vstinner
    Copy link
    Member

    Typo in the C code: depracation :-)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 12, 2014

    New changeset cc8b21988efb by Ethan Furman in branch 'default':
    bpo-19995: fixed typo; switched from test.support.check_warnings to assertWarns
    http://hg.python.org/cpython/rev/cc8b21988efb

    @ethanfurman
    Copy link
    Member Author

    Depracation warning is in place for 3.4.

    When 3.5 is tagged I'll switch it an error.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 6, 2014

    New changeset 775fb736b4b8 by Serhiy Storchaka in branch 'default':
    Catch deprecation warnings emitted when non-integers are formatted with %c, %o
    http://hg.python.org/cpython/rev/775fb736b4b8

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 19, 2014

    New changeset 9120196b3114 by Ethan Furman in branch 'default':
    bpo-19995: passing a non-int to %o, %c, %x, or %X now raises an exception
    http://hg.python.org/cpython/rev/9120196b3114

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 19, 2014

    It looks like there's typo: Psuedo => Pseudo. (Unless the first one
    is an alternate spelling I'm not aware of.)

    @ethanfurman
    Copy link
    Member Author

    No, apparently I am incapable of spelling pseudo correctly. I'll fix that along with the better error message:

    %x format: an integer is required, not float

    (variable, obviously ;)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 21, 2014

    New changeset e266525c9294 by Ethan Furman in branch 'default':
    bpo-19995: more informative error message; spelling corrections; use operator.mod instead of __mod__
    http://hg.python.org/cpython/rev/e266525c9294

    @ethanfurman
    Copy link
    Member Author

    Final status:

    3.4 -> DeprecationWarning

    3.5 -> TypeError

    @warsaw
    Copy link
    Member

    warsaw commented Jun 22, 2015

    Found my first 3.5 breakage which I think is due to this.

    >> from uuid import uuid4
    >> '%.32x' % uuid4()

    @warsaw
    Copy link
    Member

    warsaw commented Jun 22, 2015

    Fix:

    >> '%.32x' % uuid4().int

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    10 participants