classification
Title: Rounding in timedelta constructor is inconsistent with that in timedelta arithmetics
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: belopolsky, haypo, mark.dickinson, mcherm, nekobon, python-dev, rhettinger, stutzbach, tim.peters
Priority: normal Keywords: patch

Created on 2010-05-31 16:39 by belopolsky, last changed 2013-08-04 18:53 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
issue8860.patch mark.dickinson, 2010-05-31 20:24
issue8860.patch nekobon, 2013-03-18 21:29 Patch including unittest review
issue8860a.patch belopolsky, 2013-08-03 22:03 review
Messages (19)
msg106793 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-31 16:39
From issue1289118, msg106389:

"""
>>> from datetime import timedelta as d
>>> [d(microseconds=i + .5)//d.resolution for i in range(-10,10)]
[-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Should this be considered a bug?

For comparison,

>>> [d.resolution*(i+0.5)//d.resolution for i in range(-10,10)]
[-10, -8, -8, -6, -6, -4, -4, -2, -2, 0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10]

and
>>> [round(i+0.5) for i in range(-10,10)]
[-10, -8, -8, -6, -6, -4, -4, -2, -2, 0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10]

I checked the documentation and while it says: "If any argument is a float and there are fractional microseconds, the fractional microseconds left over from all arguments are combined and their sum is rounded to the nearest microsecond." it does not specify how half-integers should be handled.

While it may not be a bug in strict sense, it looks like the code in question can be improved. 
"""
msg106802 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-31 19:09
Here is a shorter example of inconsistent behavior:

>>> 0.5 * timedelta(microseconds=1)
datetime.timedelta(0)
>>> timedelta(microseconds=0.5)
datetime.timedelta(0, 0, 1)
msg106806 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-31 19:42
I agree it would be nice to fix this.  We could either (1) alter delta_new so that the final round uses round-to-nearest;  this would give the desired behaviour in most cases, but there would still be a small possibility of rounding going in the wrong direction as a result of accumulated errors in 'leftover_us'.  Or (2) rewrite delta_new to do correct rounding in all cases, by using integer arithmetic where necessary.

(2) seems like overkill to me.

For (1), it looks like it would be enough just to replace the round_to_long function with:

static long
round_to_long(double x)
{
    return (long)round(x);
}

Actually, at this point, one might as well just inline round_to_long.

Note that round_to_long is also called from datetime_from_timestamp.
msg106808 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-31 19:49
Aargh!  No, I take that back.  round() also does round-half-away-from-zero, of course.
msg106809 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-31 20:24
Here's a first stab at a patch.  It still needs tests.
msg106824 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-01 03:38
Mark> (2) seems like overkill to me.

I agree, however it would be interesting to figure out when accumulated errors can produce an inaccurate result.  ISTM that leftover is the sum of up to 7 doubles each between 0 and 1 which is then rounded to the nearest integer.  I don't see how accumulated error can exceed 1 and affect the result.
msg106825 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-01 03:54
I wonder if it would be justified to expose something like

int
_PyLong_IsOdd(PyObject *self)
{
   PyLongObject *lo = (PyLongObject *)self;
   return Py_SIZE(lo) != 0 && ((lo->ob_digit[0] & 1) != 0);
}

in longobject.h?
msg106828 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-01 06:59
The accumulated error doesn't need to exceed 1;  it just needs to be enough to make (e.g.) leftover appear to be >= 0.5 when it's actually just less than 0.5.  It shouldn't be too hard to find examples where this happens, but I haven't thought about it.

> I wonder if it would be justified to expose something like
> int _PyLong_IsOdd(PyObject *self) {...}

Yes, that could work.  I'd be happier with this if there are other uses (even within longobject.c);  there might well be, but I haven't looked.  I also have a mild preference for _PyLong_IsEven over _PyLong_IsOdd.  :)
msg106831 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-01 08:31
An example where getting correct rounding is awkward: the following rounds up, with or without the patch:

>>> datetime.timedelta(seconds=0.6112295)
datetime.timedelta(0, 0, 611230)

But in this case, the result should actually be rounded down (with either rounding mode), since the exact value of float('0.6112295') is a little less than 0.6112295:

>>> print(Decimal(0.6112295))
0.61122949999999998116351207499974407255649566650390625

The problem is that leftover_us ends up being exactly 0.5 in this case.  Again, this could be worked around if we really care, but I'm not convinced it's worth it.
msg107094 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-04 20:14
The timedelta(seconds=0.6112295) example is handled correctly because 0.6112295 sec is not half way between two nearest microseconds:

>>> abs(0.6112295 - 0.6112290) == abs(0.6112295 - 0.6112300)
False

The fact that it displays as if it is does not make timedelta rounding wrong.  I am still not sure that it is possible to accumulate rounding error by adding seven doubles, each < 1 to affect the rounded result.

While proving that the rounding is always correct or finding a counter-example is an interesting puzzle, I think it has little practical value.

I will add unit tests and get this patch ready for for commit review, but setting the priority to "low".
msg107095 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-04 20:18
The half-way check should be done with decimal arihmetics, but the result is the same:

>>> x = Decimal(0.6112295)
>>> abs(x - Decimal('0.6112290')) == abs(x - Decimal('0.6112300'))
False
msg107097 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-04 20:21
> The timedelta(seconds=0.6112295) example is handled correctly

No, it's not!  It's being rounded *up* where it should be being rounded *down*.

> because 0.6112295 sec is not half way between two nearest microseconds

Exactly.  The actual value stored by the C double is a little closer to 0.611229 than to 0.611230.
msg108551 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-24 21:18
Similar problem affects fromtimestamp() constructors:

>>> datetime.fromtimestamp(0.0078125)-datetime.fromtimestamp(0)
datetime.timedelta(0, 0, 7813)
>>> datetime.utcfromtimestamp(0.0078125)-datetime.utcfromtimestamp(0)
datetime.timedelta(0, 0, 7813)

both rounded to odd, but

>>> 0.0078125*timedelta(seconds=1)
datetime.timedelta(0, 0, 7812)

is correctly rounded to even.
msg108601 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-06-25 14:41
>> The timedelta(seconds=0.6112295) example is handled correctly

> No, it's not!  It's being rounded *up* where it should be
> being rounded *down*.

Let me try to reformulate the issue.  When use is entering 0.6112295, she means 0.6112295, not 0x1.38f312b1b36bdp-1 or 0.61122949999999998116351207499974407255649566650390625 which are exact values of the underlying binary representation of 0.6112295.  Modern Python is able to round 0.6112295 to 6 decimal places correctly:

>>> round(0.6112295, 6)
0.611229

and timedelta should do the same, but it does not:

>>> timedelta(seconds=0.6112295)
datetime.timedelta(0, 0, 611230)

With respect to accumulation, I believe the invariant that we want to have should be

timedelta(x=a, y=b, ...) == timedelta(x=a) + timedelta(y=b)

while the alternative (using higher precision addition with rounding at the end) may be more accurate in some cases, the above looks least surprising.
msg184529 - (view) Author: Yu Tomita (nekobon) * Date: 2013-03-18 21:29
I updated the unittest for the patch by Mark. The test fails without the patch and passes with it.

Also, I updated the patch to work in Python 3.4 (_datetime.c instead of datetime.c).
msg194308 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2013-08-03 22:03
I cleaned up the patch a little:

1. Removed now unused static round_to_long() function.
2. Removed commented out code.

Mark,

Any reason not to apply this?  Do we need a NEWS entry for something like this?
msg194311 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2013-08-03 22:20
With the current patch we still have the following quirks:

>>> timedelta(seconds=0.6112295) == timedelta(seconds=1)*0.6112295
False
>>> timedelta(seconds=0.6112295) == timedelta(seconds=round(0.6112295, 6))
False

This is not a reason to hold the patch, though.
msg194325 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-08-04 07:02
Alexander:  applying this is fine by me.  As a user-visible change, yes, I think it should have a Misc/NEWS entry.  (It's too small a change to be worth mentioning in the 'whatsnew' documents though.)
msg194409 - (view) Author: Roundup Robot (python-dev) Date: 2013-08-04 18:52
New changeset f7c84ef35b00 by Alexander Belopolsky in branch 'default':
Fixes #8860: Round half-microseconds to even in the timedelta constructor.
http://hg.python.org/cpython/rev/f7c84ef35b00
History
Date User Action Args
2013-08-04 18:53:48belopolskysetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2013-08-04 18:52:54python-devsetnosy: + python-dev
messages: + msg194409
2013-08-04 07:02:05mark.dickinsonsetmessages: + msg194325
2013-08-03 22:20:54belopolskysetmessages: + msg194311
2013-08-03 22:03:09belopolskysetfiles: + issue8860a.patch
priority: low -> normal
versions: + Python 3.4, - Python 3.2
messages: + msg194308

stage: test needed -> commit review
2013-03-18 21:29:38nekobonsetfiles: + issue8860.patch
nosy: + nekobon
messages: + msg184529

2010-06-25 14:41:27belopolskysetmessages: + msg108601
2010-06-24 21:18:05belopolskysetmessages: + msg108551
2010-06-04 20:21:59mark.dickinsonsetmessages: + msg107097
2010-06-04 20:18:27belopolskysetmessages: + msg107095
2010-06-04 20:14:12belopolskysetpriority: normal -> low
versions: + Python 3.2
nosy: tim.peters, mcherm, rhettinger, mark.dickinson, belopolsky, haypo, stutzbach
messages: + msg107094

components: + Extension Modules
2010-06-01 08:31:11mark.dickinsonsetmessages: + msg106831
2010-06-01 06:59:35mark.dickinsonsetmessages: + msg106828
2010-06-01 03:54:39belopolskysetmessages: + msg106825
2010-06-01 03:38:56belopolskysetmessages: + msg106824
2010-05-31 20:24:22mark.dickinsonsetfiles: + issue8860.patch
keywords: + patch
messages: + msg106809
2010-05-31 19:49:51mark.dickinsonsetmessages: + msg106808
2010-05-31 19:42:06mark.dickinsonsetmessages: + msg106806
2010-05-31 19:09:05belopolskysetmessages: + msg106802
2010-05-31 16:39:38belopolskycreate