-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datetime.utcfromtimestamp rounds results incorrectly #67705
Comments
Hi, I am porting a library from python 2.7 to 3.4 and I noticed that the behaviour of datetime.utcfromtimestamp is not consistent between the two versions. For example on python 2.7.5 the same code in python 3.4 returns a datetime with 273999 microseconds. |
This seems to have changed in 3.3 (versions up to 3.2 return 274000). |
Most likely this was a rounding fix (ie: not a bug), but hopefully Alexander will know for sure. |
Let me dig up the history, but this does not look like correct rounding to me: >>> datetime.utcfromtimestamp(1424817268.274)
datetime.datetime(2015, 2, 24, 22, 34, 28, 273999)
>>> decimal.Decimal(1424817268.274)
Decimal('1424817268.2739999294281005859375') |
It looks like it was an intentional change. See bpo-14180 (changeset 75590:1e9cc1a03365). I am not sure what the motivation was. Note that this change made utcfromtimestamp(t) different from datetime(1970,1,1) + timedelta(seconds=t). |
Victor's motivation for the change was (msg154811): """ I recall the earlier discussions of rounding in the datetime module and Mark's explanation that rounding up is fine as long as ordering is preserved. i.e. for x < y round(x) <= round(y). There are cases when producing times in the future are problematic, for example UNIX make really dislikes when file timestamps are in the future, but if this was the main motivation - rounding towards -infinity would be more appropriate. In any case, as long as we have the following in the datetime module documentation, I think this behavior is a bug: """ datetime(1970, 1, 1) + timedelta(seconds=timestamp) >>> timestamp = 1424817268.274
>>> datetime.utcfromtimestamp(timestamp) == datetime(1970, 1, 1) + timedelta(seconds=timestamp)
False |
I started a large change set to support nanoseconds in the C "pytime" API: see the issue bpo-22117. While working on this change, I noticed that the rounding mode of datetime is currently wrong. Extract of a private patch: typedef enum { I changed Modules/_datetimemodule.c to use _PyTime_ROUND_FLOOR, instead of _PyTime_ROUND_DOWN. |
What do you mean by "currently"? What versions of python have it wrong? |
Victor, Would you consider going back to round to nearest? Mark and I put in a lot of effort to get the rounding in the datetime module right. (See for example, bpo-8860.) Sub-microsecond timesources are still rare and users who work with such should avoid FP timestamps in any case. On the other hand, double precision timestamps are adequate for microsecond resolution now and in the next few decades. Timestamps like OP's (sec=1424817268, us=274000) should not change when converted to double and back. IMO, the following behavior is a bug. >>> dt = datetime(2015, 2, 24, 22, 34, 28, 274000)
>>> datetime.utcfromtimestamp(dt.timestamp())
datetime.datetime(2015, 2, 25, 3, 34, 28, 273999) |
I don't understand "nearest". I prefer to use names of decimal rounding modes: In my local patch, I'm using ROUND_FLOOR in _decimal: "Round towards -Infinity."
I'm unable right now to say which rounding mode should be used in the decimal module. But it's important to use the same rounding mode for all similar operations. For example, time.time() and datetime.datetime.now() should have the same rounding method (bad example, time.time() returns a float, which doesn't round the result). For example, in my local patch, I'm using ROUND_FLOOR for:
Note: the Python implementation of datetime uses time.localtime() and time.gmtime() for fromtimestamp(), so these functions should also have the same rounding method. |
I search for "ROUND" in Modules/_datetimemodule.c: in the Python development branch (default), I found _PyTime_ROUND_DOWN (Round towards zero). Since a bug was reported, I understand that it's not the good rounding method? |
Sorry for using loose terms. I was hoping the in the context of "going back", it would be clear. I believe the correct mode is "ROUND_HALF_EVEN". This is the mode used by the builtin round() function: >>> round(0.5)
0
>>> round(1.5)
2 |
These should use ROUND_HALF_EVEN
These should not involve floating point arithmetics, but when converting from nanoseconds to microseconds, you should round to nearest 1000 ns with 500 ns ties resolved to even number of microseconds.
This takes nanoseconds as an optional argument. Passing floats in times should probably be deprecated. In any case, here you would be rounding floats to nanoseconds and what you do with 0.5 nanoseconds is less important because in most cases they are not even representable as floats.
Is this a new method? I don't see it in 3.5.0a1.
This should be fixed >>> time.gmtime(1.999999999).tm_sec
1 is really bad and >>> time.gmtime(-1.999999999)[:6]
(1969, 12, 31, 23, 59, 59) is probably even worse.
Same story as in time.gmtime. |
We are seeing this behavior influencing other libraries in python 3.4. This should never fail if timestamp and fromtimestamp are implemented correctly: from datetime import datetime
t = datetime.utcnow().timestamp()
t2 = datetime.utcfromtimestamp(t)
assert t == t2, 'Moving from timestamp and back should always work' |
Because this seems to be a regression, I'm marking this as a release blocker. The RM can decide is isn't, of course. |
Yes, by all means, fix for 3.4, 3.5, and 3.6. If possible I'd appreciate you getting the fix checked in to 3.5 within the next 48 hours, as I'm tagging the next beta release of 3.5 around then, and it'd be nice if this fix went out in that release. |
I'm concerned by this example: >>> dt = datetime(2015, 2, 24, 22, 34, 28, 274000)
>>> dt - datetime.fromtimestamp(dt.timestamp())
datetime.timedelta(0, 0, 1) I don't know yet if it should be fixed or not. If we modify .fromtimestamp(), should we use the same rounding method in datetime constructor? And in datetime.now()/.utcnow()? I would prefer to keep ROUND_DOWN for .now() and .utcnow() to avoid timestamps in the future. I care less for other methods. What do you think of this plan? --- Hum, I don't remember the whole story line of rounding timestamps in Python. Some raw data. Include/pytime.h of Python 3.5+ has: typedef enum { Include/pytime.h of Python 3.4 had: typedef enum { Include/pytime.h of Python 3.3 and older didn't have rounding. C files using pytime.h rounding in Python 3.4 (grep -l _PyTime_ROUND */*.c): Modules/_datetimemodule.c It is used by 3 mores C files in Python 3.5: Modules/socketmodule.c NEAREST was never implemented in pytime.h. If I recall correctly, there were inconsitencies between the Python and the C implementation of the datetime module. At least in Python 3.5, both implementations should be consistent (even if some people would prefer a different rounding method). The private pytime API was rewritten in Python 3.5 to get nanosecond resolution. This API is only used by the datetime module to get the current time. My rationale for ROUND_DOWN was to follow how UNIX rounds timestmaps. As Alexander wrote, UNIX doesn't like timestamps in the future, so rounding towards minus infinity avoids such issue. Rounding issues become more common on file timestamps with filesystems supporting microsecond resolution or event nanosecond resolution. |
Victor> I don't know yet if it should be fixed or not. It is my understanding that datetime -> timestamp -> datetime round-tripping was exact in 3.3 for datetimes not too far in the future (as of 2015), but now it breaks for datetime(2015, 2, 24, 22, 34, 28, 274000).
I don't think this is a serious consideration. The problematic scenario would be obtaining high-resolution timestamp (from say time.time()), converting it to datetime and passing it back to OS as a possibly 0.5µs |
I'm not going to hold up beta 3 while you guys argue about how to round up or down the number of angels that can dance on the head of a pin. |
Le vendredi 3 juillet 2015, Alexander Belopolsky <report@bugs.python.org> a
In many cases the resolution is 1 second. For example, a filesystem with a With a resoltuion of 1 second, timestamps in the future are likely (50%). Sorry I don't remember all detail of timestamp rounding and all issues that |
My rationale is more general than datetime. But problems araise when |
I'll let others fight this battle. In my view, introducing floating point timestamp method for datetime objects was a mistake. See issue bpo-2736. Specifically, I would like to invite Velko Ivanov to rethink his rant at msg124197. If anyone followed his advise and started using timestamp method to JSON-serialize datetimes around 3.3, have undoubtedly being bitten by the present bug (but may not know it yet.) For those who need robust code, I will continue recommending (dt - EPOCH)/timedelta(seconds=1) expression over the timestamp method and for JSON serialization (dt - EPOCH) // datetime.resolution to convert to integers and EPOCH + n * datetime.resolution to convert back. |
It is really bad that roundtripping current microsecond datetimes doesn't work. About half of all microsecond-resolution datetimes fail to roundtrip correctly now. While the limited precision of a C double guarantees roundtripping of microsecond datetimes "far enough" in the future will necessarily fail, that point is about 200 years from now. Rather than argue endlessly about rounding, it's possible instead to make the tiniest possible change to the timestamp _produced_ at the start. Here's code explaining it: ts = d.timestamp()
# Will microseconds roundtrip correctly? For times far
# enough in the future, there aren't enough bits in a C
# double for that to always work. But for years through
# about 2241, there are enough bits. How does it fail
# before then? Very few microsecond datetimes are exactly
# representable as a binary float. About half the time, the
# closest representable binary float is a tiny bit less than
# the decimal value, and that causes truncating 1e6 times
# the fraction to be 1 less than the original microsecond
# value.
if int((ts - int(ts)) * 1e6) != d.microsecond:
# Roundtripping fails. Add 1 ulp to the timestamp (the
# tiniest possible change) and see whether that repairs
# it. It's enough of a change until doubles just plain
# run out of enough bits.
mant, exp = math.frexp(ts)
ulp = math.ldexp(0.5, exp - 52)
ts2 = ts + ulp
if int((ts2 - int(ts2)) * 1e6) == d.microsecond:
ts = ts2
else:
# The date is so late in time that a C double's 53
# bits of precision aren't sufficient to represent
# microseconds faithfully. Leave the original
# timestamp alone.
pass
# Now ts exactly reproduces the original datetime,
# if that's at all possible. This assumes timestamps are >= 0, and that C doubles have 53 bits of precision. Note that because a change of 1 ulp is the smallest possible change for a C double, this cannot make closest-possible unequal datetimes produce out-of-order after-adjustment timestamps. And, yes, this sucks ;-) But it's far better than having half of timestamps fail to convert back for the next two centuries. Alas, it does nothing to get the intended datetime from a microsecond-resolution timestamp produced _outside_ of Python. That requires rounding timestamps on input - which would be a better approach. Whatever theoretical problems may exist with rounding, the change to use truncation here is causing real problems now. Practicality beats purity. |
I wish we could use the same algorithm in datetime.utcfromtimestamp as we use in float to string conversion. This may allow the following chain of conversions to round trip in most cases: float literal -> float -> datetime -> seconds.microseconds string |
New changeset d755f75019c8 by Victor Stinner in branch 'default': |
That is true, but what does matter is who is producing the incoming floats. Consider an extreme case of a timer that ticks twice a microsecond and gives you results in a timespec struct style sec, nsec pair. If you convert those to timedeltas with timedelta(0, sec, nsec/1000), you will see half of nsec/1000 floats ending in .5 and half ending with .0. If you have a large sample of (sec, nsec) measurements, the mean of corresponding timedeltas will be 0.25µs larger with the half-up rounding than the actual mean of your samples. Is round-half-to-even a panacea? Of course not. In an extreme case if all your measurements are 1.5µs - it will not help at all, but in a more realistic sample you are likely to have an approximately equal number of even and odd µs and the Dutch rounding will affect the mean much less than round-half-up. As you said, for most uses none of this matters, but we are not discussing a change to timedelta(0, s) rounding here. All I want from this issue is to keep the promise that we make in the docs: On the POSIX compliant platforms, utcfromtimestamp(timestamp) is equivalent to the following expression: datetime(1970, 1, 1) + timedelta(seconds=timestamp) https://docs.python.org/3/library/datetime.html#datetime.datetime.utcfromtimestamp Tim, while we have this entertaining theoretical dispute, I am afraid we are leaving Victor confused about what he has to do in his patch. I think we agree that fromtimestamp() should use Dutch rounding. We only disagree why. If this is the case, please close this thread with a clear advise to use round-half-to-even and we can continue discussing the rationale privately or in a different forum. |
Victor, sorry if I muddied the waters here: Alex & I do agree nearest/even must be used. It's documented for timedelta already, and the seconds-from-the-epoch invariant Alex showed is at least as important to preserve as round-tripping. Alex, agreed about the overall mean in the example, but expect that continuing to exchange contrived examples isn't really a major life goal for either of us ;-) Thanks for playing along. |
Ok, so Python 3.6 should be updated to use ROUND_HALF_EVEN, and then Right now, Python 3.6 uses ROUND_HALF_UP. |
You may find it easier to start with a patch for 3.5 which is the only time sensitive task. I'll be happy to review your code in whatever form you find it easier to submit, but I believe in hg it is easier to forward port than to backport. |
So is this considered broken enough that I need to accept a fix for 3.5.0? And has a consensus been reached about exactly what that fix would be? |
Universal consensus on ROUND_HALF_EVEN, yes. I would like to see it repaired for 3.5.0, but that's just me saying so without offering to do a single damned thing to make it happen ;-) |
Well, I'm already holding up rc3 on one other issue, might as well fix this too. Can somebody make me a pull request? |
I suspect we're not fixing this in 3.4, so I'm removing 3.4 from the version list. |
Okay, this is literally the only thing rc3 is waiting on now. |
I understand that I have to implement a new rounding mode. The code will be I will change python 3.6, then 3.4 and 3.5.1.
|
Is it appropriate to make this change as a "bug fix", in already-released versions of Python? Would you be happy or sad if you updated your Python from 3.x.y to 3.x.y+1 and the rounding method used when converting floats to datetime stamps changed? |
Well, for now I assume it really truly genuinely isn't going in 3.5.0. I suppose we can debate about 3.4.x and 3.5.1 later, once we have a fix everybody is happy with. |
Larry> Well, for now I assume it really truly genuinely isn't going in 3.5.0. This is an unfortunate outcome. Larry> I suppose we can debate about 3.4.x and 3.5.1 later It is even more unfortunate that the question of whether this regression is a bug or not is up for debate again. Victor> The code will be new, I'm not condifent enough to push it immedialty into python 3.5. I don't understand why any new code is needed to fix a regression. All you need to do is to revert a few chunks of 1e9cc1a03365 where the regression was introduced. |
If the guy doing the work says "don't merge it in 3.5.0", and we're at the final release candidate before 3.5.0 final ships, and we don't even have a patch that everyone likes yet... it does seem like it's not going to happen for 3.5.0. Unfortunate perhaps but that's the situation we're in. |
Hum, please don't revert this change. I spent a lot of days to write pytime.c/.h. My patches add more unit tests to datetime, to test the exact rounding mode.
The rounding mode of microseconds only impact a very few people. The rounding mode changed in 3.3, again in 3.4 and again in 3.5. This issue is the first bug report. So I think it's fine to modify 3.4.x and 3.5.x. |
Victor> please don't revert this change. I did not suggest reverting the entire commit. The change that affects fromdatetime() is just
+ us = int(frac * 1e6) in datetime.py. It is probably more involved in _datetimemodule.c, but cannot be that bad. You can still keep pytime.c/.h. |
FWIW, count me as +1 on roundTiesToEven, mostly just for consistency. It's easier to remember that pretty much everything in Python 3 does round-ties-to-even (round function, string formatting, float literal evaluations, int-to-float conversion, Fraction-to-float conversion, ...) than to have to remember that there's a handful of corner cases where we do roundTiesToAway instead. |
New changeset d0bb896f9b14 by Victor Stinner in branch 'default': |
New changeset 171d5590ebc3 by Victor Stinner in branch 'default': |
I wish people wouldn't remove old patches. There's no harm in leaving them, and it may be a useful historical record. |
I modified Python 3.6 to use ROUND_HALF_EVEN rounding mode in datetime.timedelta, datetime.datetime.fromtimestamp(), datetime.datetime.utcfromtimestamp(). round_half_even_py34.patch: Backport this change to Python 3.4. I tried to write a minimal patch only changing datetime, not pytime. |
It became hard to read this issue, it has a long history. My old patches for Python 3.4 were for ROUND_HALF_UP, but it was then decided to use ROUND_HALF_EVEN. Technically, patches are not "removed" but "unlinked" from the issue: you can get from the history at the bottom of this page. |
Given Victor's reluctance to get this in to 3.5.0, this can't even be marked as a "deferred blocker" anymore. Demoting to normal priority. |
Alexander: can you please review attached round_half_even_py34.patch? It's the minimum patch to fix datetime/timedelta rounding for Python 3.4 (and 3.5). |
New changeset ee1cf1b188d2 by Victor Stinner in branch '3.4': |
Ok, I fixed Python 3.4 and 3.5 too: fromtimestamp() and utcfromtimestamp() now uses also ROUND_HALF_EVEN rounding mode, as timedelta constructor. I explained in Misc/NEWS that the change was made to respect the property: (datetime(1970,1,1) + timedelta(seconds=t)) == datetime.utcfromtimestamp(t) Thanks Alexander Belopolsky & Tim Peters for your explanation on rounding. It's not a simple problem :-) Thanks Tommaso Barbugli for the bug report: it's now fixed on all (maintained) Python 3 versions: 3.4, 3.5 and 3.6. The fix will be part of Python 3.5.1. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: