Message 132697 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	Neil Muller, amaury.forgeotdarc, andersjm, belopolsky, catlee, davidfraser, erik.stephens, guettli, hodgestar, jribbens, mark.dickinson, ping, pitrou, r.david.murray, steve.roberts, tim.peters, tomster, vivanov, vstinner, werneck
Date	2011-03-31.20:18:29
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<AANLkTik=N1MwUod_NmH0ukNiDyYXSODw8CLRTNxxy6u2@mail.gmail.com>
In-reply-to	<1301597573.29.0.36281636792.issue2736@psf.upfronthosting.co.za>

Content
On Thu, Mar 31, 2011 at 2:52 PM, Ka-Ping Yee <report@bugs.python.org> wrote: .. > I am extremely disappointed by what has happened here. > What exactly are you disappointed about? As far as I can tell, the feature request has not been rejected, just no one has come up with a satisfactory solution. The issue is open and patches are welcome. > We are talking about a very simple method that everybody needs, and that has been > reimplemented over and over again. I have been frustrated countless times by the lack of > a utctotimestamp() method. This is not what this issue has been about so far. It was about local time to timestamp. In py3k, utctotimestamp() is easy: EPOCH = datetime(1970, 1, 1) def utctotimestamp(dt) : return (dt - EPOCH).total_seconds() > I have watched beginners and experienced programmers alike suffer over and over > again for the lack of this method, and spend hours trying to figure out why Python > doesn't have it and how it should be spelled in Python. > These "beginners and experienced programmers" may want to reconsider using floating point numbers to store high precision timestamps. I know that some OSes made the same unfortunate choice in system APIs, but it does not make this choice any better. I can make a long list of why this is a bad choice, but I'll just mention that the precision of your timestamp varies from year to year and the program that works fine today may mysteriously fail in 5 years when nobody is around who can fix it anymore. > The discussion here has been stuck on assumptions that the method must meet > all of the following ideals: > > 1. It must produce a value that is easy to compute with > 2. It must have perfect precision in representing microseconds, forever > 3. It must make an exact round-trip for any possible input > 4. It must let users use whatever epoch they want > No it was actually stuck because of the inability to reliably obtain the system UTC offset for historical times. This is a solvable problem, but the patches proposed so far did not solve it correctly. On top of this, there is an issue of datetime.fromtimestamp() not being invertible in the presence of DST shifts, so datetime.totimestamp() is ambiguous for some datetime values. > These ideals cannot all be met simultaneously and perfectly. The correct thing to > do as an engineer is to choose a practical compromise and document the decision. > > The compromise that almost everyone chooses (because it is useful, convenient, has > microsecond precision at least until the year 2100, and millisecond precision is frequently > sufficient) is to use a floating-point number with an epoch of 1970-01-01. Floating-point > seconds can be easily subtracted, added, serialized, and deserialized, and are a primitive > data type in nearly every language and database. Those who need to do arithmetics on time values more often deal with durations rather than points in time. An arbitrary epoch around current time is often more appropriate for timeseries analytics than Unix epoch. > They are unmatched in ease of use. Compared to what? I find integers much more suitable for representing points in time than floats. Yes, in some languages you have to deal with 32-bit int overflow issues if you want to be able to deal with durations of over 100 years expressed in microseconds, but these days 64-bit integers are almost universally available. > So everyone wastes time searching for the answer and figuring out how to write: > > import calendar > calendar.timegm(dt.utctimetuple()) + dt.microsecond * 1e-6 > And this is the wrong answer. Someone else using (dt - EPOCH).total_seconds() may get a slightly different result. Some may argue that given that it is not obvious what expression to use, we need to provide a function. However, we already provided timedelta.total_seconds() that hides the floating point details. In my opinion, even adding total_seconds() was a mistake and x / timedelta(seconds=1) is just as short and more explicit than x.total_seconds(). I think the best we can do is to expand datetime.utcfromtimestamp() documentation to explain that it is equivalent to def utcfromtimestamp(s): return EPOCH + timedelta(seconds=s) and either leave it as an exercise to the reader to solve utcfromtimestamp(s) = dt for s or spell out def utctotimestamp(dt) : return (dt - EPOCH) / timedelta(seconds=1)

On Thu, Mar 31, 2011 at 2:52 PM, Ka-Ping Yee <report@bugs.python.org> wrote:
..
> I am extremely disappointed by what has happened here.
>

What exactly are you disappointed about?  As far as I can tell, the
feature request has not been rejected, just no one has come up with a
satisfactory solution.   The issue is open and patches are welcome.

> We are talking about a very simple method that everybody needs, and that has been
> reimplemented over and over again.  I have been frustrated countless times by the lack of
> a utctotimestamp() method.

This is not what this issue has been about so far.  It was about local
time to timestamp.  In py3k,  utctotimestamp() is easy:

EPOCH = datetime(1970, 1, 1)
def utctotimestamp(dt) :
      return (dt - EPOCH).total_seconds()

>  I have watched beginners and experienced programmers alike suffer over and over
> again for the lack of this method, and spend hours trying to figure out why Python
> doesn't have it and how it should be spelled in Python.
>

These "beginners and experienced programmers" may want to reconsider
using floating point numbers to store high precision timestamps.  I
know that some OSes made the same unfortunate choice in system APIs,
but it does not make this choice any better.   I can make a long list
of why this is a bad choice, but I'll just mention that the precision
of your timestamp varies from year to year and the program that works
fine today may mysteriously fail in 5 years when nobody is around who
can fix it anymore.

> The discussion here has been stuck on assumptions that the method must meet
> all of the following ideals:
>
>  1. It must produce a value that is easy to compute with
>  2. It must have perfect precision in representing microseconds, forever
>  3. It must make an exact round-trip for any possible input
>  4. It must let users use whatever epoch they want
>

No it was actually stuck because of the inability to reliably obtain
the system UTC offset for historical times.  This is a solvable
problem, but the patches proposed so far did not solve it correctly.
On top of this,  there is an issue of datetime.fromtimestamp() not
being invertible in the presence of DST shifts, so
datetime.totimestamp() is ambiguous for some datetime values.

> These ideals cannot all be met simultaneously and perfectly.  The correct thing to
> do as an engineer is to choose a practical compromise and document the decision.
>
> The compromise that almost everyone chooses (because it is useful, convenient, has
> microsecond precision at least until the year 2100, and millisecond precision is frequently
> sufficient) is to use a floating-point number with an epoch of 1970-01-01.  Floating-point
> seconds can be easily subtracted, added, serialized, and deserialized, and are a primitive
> data type in nearly every language and database.

Those who need to do arithmetics on time values more often deal with
durations rather than points in time.   An arbitrary epoch around
current time is often more appropriate for timeseries analytics than
Unix epoch.

>  They are unmatched in ease of use.

Compared to what?  I find integers much more suitable for representing
points in time than floats.  Yes, in some languages you have to deal
with 32-bit int overflow issues if you want to be able to deal with
durations of over 100 years expressed in microseconds, but these days
64-bit integers are almost universally available.

>  So everyone wastes time searching for the answer and figuring out how to write:
>
>    import calendar
>    calendar.timegm(dt.utctimetuple()) + dt.microsecond * 1e-6
>

And this is the wrong answer.  Someone else using (dt -
EPOCH).total_seconds() may get a slightly different result.  Some may
argue that given that it is not obvious what expression to use, we
need to provide a function.  However, we already provided
timedelta.total_seconds() that hides the floating point details.  In
my opinion, even adding total_seconds() was a mistake and x /
timedelta(seconds=1) is just as short and more explicit than
x.total_seconds().

I think the best we can do is to expand datetime.utcfromtimestamp()
documentation to explain that it is equivalent to

def utcfromtimestamp(s):
     return EPOCH + timedelta(seconds=s)

and either leave it as an exercise to the reader to solve
utcfromtimestamp(s) = dt for s or spell out

def utctotimestamp(dt) :
      return (dt - EPOCH) / timedelta(seconds=1)

History
Date	User	Action	Args
2011-03-31 20:18:30	belopolsky	set	recipients: + belopolsky, tim.peters, ping, jribbens, guettli, amaury.forgeotdarc, mark.dickinson, davidfraser, pitrou, andersjm, catlee, vstinner, tomster, werneck, hodgestar, Neil Muller, erik.stephens, steve.roberts, r.david.murray, vivanov
2011-03-31 20:18:29	belopolsky	link	issue2736 messages
2011-03-31 20:18:29	belopolsky	create