Message 248942 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	larry, mark.dickinson, r.david.murray, tbarbugli, tim.peters, trcarden, vivanov, vstinner
Date	2015-08-21.02:14:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1440123288.56.0.707573606269.issue23517@psf.upfronthosting.co.za>
In-reply-to

Content
It is really bad that roundtripping current microsecond datetimes doesn't work. About half of all microsecond-resolution datetimes fail to roundtrip correctly now. While the limited precision of a C double guarantees roundtripping of microsecond datetimes "far enough" in the future will necessarily fail, that point is about 200 years from now. Rather than argue endlessly about rounding, it's possible instead to make the tiniest possible change to the timestamp _produced_ at the start. Here's code explaining it: ts = d.timestamp() # Will microseconds roundtrip correctly? For times far # enough in the future, there aren't enough bits in a C # double for that to always work. But for years through # about 2241, there are enough bits. How does it fail # before then? Very few microsecond datetimes are exactly # representable as a binary float. About half the time, the # closest representable binary float is a tiny bit less than # the decimal value, and that causes truncating 1e6 times # the fraction to be 1 less than the original microsecond # value. if int((ts - int(ts)) * 1e6) != d.microsecond: # Roundtripping fails. Add 1 ulp to the timestamp (the # tiniest possible change) and see whether that repairs # it. It's enough of a change until doubles just plain # run out of enough bits. mant, exp = math.frexp(ts) ulp = math.ldexp(0.5, exp - 52) ts2 = ts + ulp if int((ts2 - int(ts2)) * 1e6) == d.microsecond: ts = ts2 else: # The date is so late in time that a C double's 53 # bits of precision aren't sufficient to represent # microseconds faithfully. Leave the original # timestamp alone. pass # Now ts exactly reproduces the original datetime, # if that's at all possible. This assumes timestamps are >= 0, and that C doubles have 53 bits of precision. Note that because a change of 1 ulp is the smallest possible change for a C double, this cannot make closest-possible unequal datetimes produce out-of-order after-adjustment timestamps. And, yes, this sucks ;-) But it's far better than having half of timestamps fail to convert back for the next two centuries. Alas, it does nothing to get the intended datetime from a microsecond-resolution timestamp produced _outside_ of Python. That requires rounding timestamps on input - which would be a better approach. Whatever theoretical problems may exist with rounding, the change to use truncation here is causing real problems now. Practicality beats purity.

It is really bad that roundtripping current microsecond datetimes doesn't work. About half of all microsecond-resolution datetimes fail to roundtrip correctly now. While the limited precision of a C double guarantees roundtripping of microsecond datetimes "far enough" in the future will necessarily fail, that point is about 200 years from now.

Rather than argue endlessly about rounding, it's possible instead to make the tiniest possible change to the timestamp _produced_ at the start. Here's code explaining it:

ts = d.timestamp()
# Will microseconds roundtrip correctly? For times far
# enough in the future, there aren't enough bits in a C
# double for that to always work. But for years through
# about 2241, there are enough bits. How does it fail
# before then? Very few microsecond datetimes are exactly
# representable as a binary float. About half the time, the
# closest representable binary float is a tiny bit less than
# the decimal value, and that causes truncating 1e6 times
# the fraction to be 1 less than the original microsecond
# value.
if int((ts - int(ts)) * 1e6) != d.microsecond:
# Roundtripping fails. Add 1 ulp to the timestamp (the
# tiniest possible change) and see whether that repairs
# it. It's enough of a change until doubles just plain
# run out of enough bits.
mant, exp = math.frexp(ts)
ulp = math.ldexp(0.5, exp - 52)
ts2 = ts + ulp
if int((ts2 - int(ts2)) * 1e6) == d.microsecond:
ts = ts2
else:
# The date is so late in time that a C double's 53
# bits of precision aren't sufficient to represent
# microseconds faithfully. Leave the original
# timestamp alone.
pass
# Now ts exactly reproduces the original datetime,
# if that's at all possible.

This assumes timestamps are >= 0, and that C doubles have 53 bits of precision. Note that because a change of 1 ulp is the smallest possible change for a C double, this cannot make closest-possible unequal datetimes produce out-of-order after-adjustment timestamps.

And, yes, this sucks ;-) But it's far better than having half of timestamps fail to convert back for the next two centuries. Alas, it does nothing to get the intended datetime from a microsecond-resolution timestamp produced _outside_ of Python. That requires rounding timestamps on input - which would be a better approach.

Whatever theoretical problems may exist with rounding, the change to use truncation here is causing real problems now. Practicality beats purity.

History
Date	User	Action	Args
2015-08-21 02:14:48	tim.peters	set	recipients: + tim.peters, mark.dickinson, vstinner, larry, r.david.murray, vivanov, tbarbugli, trcarden
2015-08-21 02:14:48	tim.peters	set	messageid: <1440123288.56.0.707573606269.issue23517@psf.upfronthosting.co.za>
2015-08-21 02:14:48	tim.peters	link	issue23517 messages
2015-08-21 02:14:47	tim.peters	create