classification
Title: datetime needs and "epoch" method
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Neil Muller, andersjm, belopolsky, davidfraser, haypo, hodgestar, tebeka, werneck (8)
Priority: Keywords: patch

Created on 2008-05-01 21:03 by tebeka, last changed 2009-01-13 23:42 by haypo.

Files
File name Uploaded Description Edit Remove
add-datetime-totimestamp-method.diff hodgestar, 2008-05-10 14:55 Implementation of datetime.datetime.timetuple and tests.
add-datetime-totimestamp-method-docs.diff hodgestar, 2008-05-10 15:54
datetime_totimestamp-3.patch haypo, 2008-12-12 01:15
Messages (26)
msg66045 - (view) Author: Miki Tebeka (tebeka) Date: 2008-05-01 21:03
If you try to convert datetime objects to seconds since epoch and back
it will not work since the microseconds get lost:

>>> dt = datetime(2008, 5, 1, 13, 35, 41, 567777)
>>> seconds = mktime(dt.timetuple())
>>> datetime.fromtimestamp(seconds) == dt
False

Current fix is to do
>>> seconds += (dt.microsecond / 1000000.0)
>>> datetime.fromtimestamp(seconds) == dt
True
msg66140 - (view) Author: Pedro Werneck (werneck) Date: 2008-05-03 02:18
That's expected as mktime is just a thin wrapper over libc mktime() and
it does not expect microseconds. Changing time.mktime doesn't seems an
option, so the best alternative is to implement a method in datetime
type. Is there a real demand for C code implementing this to justify it?
msg66532 - (view) Author: Simon Cross (hodgestar) Date: 2008-05-10 14:55
Attached a patch which adds a .totimetuple(...) method to
datetime.datetime and tests for it.

The intention is that the dt.totimetuple(...) method is equivalent to:
mktime(dt.timetuple()) + (dt.microsecond / 1000000.0)
msg66539 - (view) Author: Simon Cross (hodgestar) Date: 2008-05-10 15:54
Patch adding documentation for datetime.totimestamp(...).
msg66601 - (view) Author: Miki Tebeka (tebeka) Date: 2008-05-11 06:12
I think the name is not good, should be "toepoch" or something like that.
msg66610 - (view) Author: Neil Muller (Neil Muller) Date: 2008-05-11 08:25
datetime has fromtimestamp already, so using totimestamp keeps naming
consistency (see toordinal and fromordinal).
msg75723 - (view) Author: STINNER Victor (haypo) Date: 2008-11-11 03:12
See also issue1673409
msg75899 - (view) Author: STINNER Victor (haypo) Date: 2008-11-15 00:33
I like the method, but I have some comments about the new method:
 - datetime_totimestamp() is not well indented
 - "PyObject *time" should be defined at the before the first 
instruction
 - why not using "if (time == NULL) return NULL;" directly instead of 
using a block in case of time is not NULL?
 - there are reference leaks: timetuple, timestamp and 
PyFloat_FromDouble()

I wrote a similar patch before reading 
add-datetime-totimestamp-method.diff which does exactly the same... I 
attach my patch but both should be merged.
msg75900 - (view) Author: STINNER Victor (haypo) Date: 2008-11-15 00:41
Here is a merged patch of the three patches. Except the C 
implementation of datetime_totimestamp() (written by me), all code is 
written by hodgestar.
msg75902 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-15 01:15
I would like to voice my opposition the totimestamp method.

Representing time as a float is a really bad idea (originated at 
Microsoft as I have heard).  In addition to the usual numeric problems 
when dealing with the floating point, the resolution of the floating 
point timestamp varies from year to year making it impossible to 
represent high resolution historical data.

In my opinion both time.time() returning float and 
datetime.fromtimestamp() taking a float are both design mistakes and 
adding totimestamp that produces a float will further promote a bad 
practice.

I would not mind integer based to/from timestamp methods taking and 
producing seconds or even (second, microsecond) tuples, but I don't 
think changing fromtimestamp behavior is an option.
msg75903 - (view) Author: STINNER Victor (haypo) Date: 2008-11-15 01:37
Le Saturday 15 November 2008 02:15:30 Alexander Belopolsky, vous avez écrit :
> I don't think changing fromtimestamp behavior is an option.

It's too late to break the API (Python3 is in RC stage ;-)), but we can create 
new methods like:
   datetime.fromepoch(seconds, microseconds=0)    # (int/long, int)
   datetime.toepoch() -> (seconds, microseconds)  # (int/long, int)
msg75904 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-15 03:17
On Fri, Nov 14, 2008 at 8:37 PM, STINNER Victor <report@bugs.python.org> wrote:
> .. but we can create new methods like:
>   datetime.fromepoch(seconds, microseconds=0)    # (int/long, int)

While 1970 is the most popular epoch, I've seen 1900, 2000 and even
2035 (!) being used as well.  Similarly, nanoseconds are used in high
resolution time sources at least as often as microseconds.  This makes
fromepoch() ambiguous and it is really unnecessary because it can be
written as epoch + timedelta(0, seconds, microseconds).

>   datetime.toepoch() -> (seconds, microseconds)  # (int/long, int)

I would much rather have divmod implemented as you suggested in
issue2706 .  Then toepoch is simply

def toepoch(d):
    x, y = divmod(d, timedellta(0, 1))
    return x, y.microseconds
msg75912 - (view) Author: STINNER Victor (haypo) Date: 2008-11-15 12:19
Le Saturday 15 November 2008 04:17:50 Alexander Belopolsky, vous avez écrit :
> it is really unnecessary because it can be
> written as epoch + timedelta(0, seconds, microseconds).

I tried yesterday and it doesn't work!

datetime.datetime(1970, 1, 1, 1, 0)
>>> t1 = epoch + timedelta(seconds=-1660000000)
>>> t2 = datetime.fromtimestamp(-1660000000)
>>> t2
datetime.datetime(1917, 5, 26, 1, 53, 20)
>>> t1 - t2
datetime.timedelta(0)
>>> t2 = datetime.fromtimestamp(-1670000000)
>>> t2
datetime.datetime(1917, 1, 30, 7, 6, 40)
>>> t1 = epoch + timedelta(seconds=-1670000000)
>>> t1 - t2
datetime.timedelta(0, 3600)

We lost an hour durint the 1st World War :-)

Whereas my implementation using mktime() works:

-1670000000.0
msg76003 - (view) Author: Anders J. Munch (andersjm) Date: 2008-11-18 09:05
Any thoughts to time zone/DST handling for naive datetime objects? E.g.
suppose the datetime object was created by .utcnow or .utcfromtimestamp.

For aware datetime objects, I think the time.mktime(dt.timetuple())
approach doesn't work; the tz info is lost in the conversion to time tuple.
msg76324 - (view) Author: David Fraser (davidfraser) Date: 2008-11-24 14:04
----- "Alexander Belopolsky" <report@bugs.python.org> wrote:
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the
> comment:
> 
> I would like to voice my opposition the totimestamp method.
> 
> Representing time as a float is a really bad idea (originated at 
> Microsoft as I have heard).  In addition to the usual numeric problems
> when dealing with the floating point, the resolution of the floating 
> point timestamp varies from year to year making it impossible to 
> represent high resolution historical data.
> 
> In my opinion both time.time() returning float and 
> datetime.fromtimestamp() taking a float are both design mistakes and 
> adding totimestamp that produces a float will further promote a bad 
> practice.

The point for me is that having to interact with Microsoft systems that require times means that the conversions have to be done. Is it better to have everybody re-implement this, with their own bugs, or to have a standard implementation? I think it's clearly better to have it as a method on the object. Of course, we should put docs in describing the pitfalls of this approach...
msg76327 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-24 15:17
On Mon, Nov 24, 2008 at 9:04 AM, David Fraser <report@bugs.python.org> wrote:
...
> The point for me is that having to interact with Microsoft systems that require times means that the conversions have to be done.

I did not see the "epoch" proposal as an interoperability with
Microsoft systems feature.  If this is the goal, a deeper analysis of
the Microsoft standards is in order.  For example, what is the valid
range of the floating point timestamp?  What is the range for which
fromepoch (float to datetime) translation is valid?  For example, if
all floats are valid timestamps, then fromepoch can be limited to +/-
2**31 or to a smaller range where a float has enough precision to
roundtrip microseconds.

> Is it better to have everybody re-implement this, with their own bugs, or to have a standard implementation?

As far as I know, interoperability with Microsoft systems requires
re-implementation of their bugs many of which are not documented.  For
example, OOXML requires that 1900 be treated as a leap year at least
in some cases.  When you write your own implementation, at least you
have the source code to your own bugs.

> I think it's clearly better to have it as a method on the object. Of course, we should put docs in describing the pitfalls of this approach...

Yes, having a well documented high resolution "time since epoch" to
"local datetime" method in the datetime module is helpful if
non-trivial timezones (such as the one Victor lives in) are supported.
 However, introducing floating point pitfalls into the already
overcomplicated realm of calendar calculations would be a mistake.

I believe the correct approach would be to extend fromtimestamp (and
utcfromtimestamp) to accept a (seconds, microseconds) tuple as an
alternative (and in addition) to the float timestamp.  Then
totimestamp can be implemented to return such tuple that
fromtimestamp(totimestamp(dt) == dt for any datetime dt and
totimestamp(fromtimestamp((s,us))) == (s, us) for any s and us within
datetime valid range (note that s will have to be a long integer to
achieve that).

In addition exposing the system gettimeofday in the time module to
produce (s, us) tuples may be helpful to move away from float
timestamps produced by time.time(), but with totimestamp as proposed
above that would be equivalent to datetime.now().totimestamp().
msg76329 - (view) Author: STINNER Victor (haypo) Date: 2008-11-24 15:46
About the timestamp, there are many formats:

(a) UNIX: 32 bits signed integer, number of seconds since the 1st 
january 1970. 
 - file format: gzip header, Portable Executable (PE, Windows), 
compiled python script header (.pyc/.pyo)
 - file system: ext2 and ext3

(b) UNIX64: 64 bits signed integer, number of seconds since the 1st 
january 1970
 - file format: Gnome keyring

(c) UNIX: 32 bits unsigned integer, number of seconds since the 1st 
january 1904
 - file format: True Type Font (.ttf), iTunes database, AIFF, .MOV

(d) UUID60: 60 bits unsigned integer, number of 1/10 microseconds 
since the 15st october 1582
 - all formats using UUID version 1 (also known as "GUID" in the 
Microsoft world)

(e) Win64: 64 bits unsigned integer, number of 1/10 microseconds since 
the 1st january 1601
 - file format: Microsoft Office documents (.doc, .xls, etc), ASF 
video (.asf), Windows link (.lnk)
 - file system: NTFS

(f) MSDOS DateTime or TimeDate: bitfield with 16 bits for the date and 
16 bits for the time. Time precision is 2 seconds, year is in range 
[1980; 2107]
 - file format: Windows link (.lnk), CAB archive (.cab), Word document 
(.doc), ACE archive (.ace), ZIP archive (.zip), RAR achive (.rar)
msg76331 - (view) Author: STINNER Victor (haypo) Date: 2008-11-24 16:00
Timedelta formats:

(a) Win64: 64 bits unsigned integer, number of 1/10 microsecond
 - file format: Microsoft Word document (.doc), ASF video (.asf)

(b) 64 bits float, number of seconds
 - file format: AMF metadata used in Flash video (.flv)

Other file formats use multiple numbers to store a duration:

[AVI video]
 - 3 integers (32 bits unsigned): length, rate, scale
 - seconds = length / (rate / scale) 
 - (seconds = length * scale / rate)

[WAV audio]
 - 2 integers (32 bits unsigned): us_per_frame, total_frame
 - seconds = total_frame * (1000000 / us_per_frame)

[Ogg Vorbis]
 - 2 integers: sample_rate (32 bits unsigned), position (64 bits 
unsigned)
 - seconds = position / sample_rate
msg76332 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-24 16:14
That's an impressive summary, but what is your conclusion?  I don't
see any format that will benefit from a subsecond
timedelta.totimestamp().  Your examples have either multisecond or
submicrosecond resolution.

On Mon, Nov 24, 2008 at 11:00 AM, STINNER Victor <report@bugs.python.org> wrote:
>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> Timedelta formats:
>
> (a) Win64: 64 bits unsigned integer, number of 1/10 microsecond
>  - file format: Microsoft Word document (.doc), ASF video (.asf)
>
> (b) 64 bits float, number of seconds
>  - file format: AMF metadata used in Flash video (.flv)
>
> Other file formats use multiple numbers to store a duration:
>
> [AVI video]
>  - 3 integers (32 bits unsigned): length, rate, scale
>  - seconds = length / (rate / scale)
>  - (seconds = length * scale / rate)
>
> [WAV audio]
>  - 2 integers (32 bits unsigned): us_per_frame, total_frame
>  - seconds = total_frame * (1000000 / us_per_frame)
>
> [Ogg Vorbis]
>  - 2 integers: sample_rate (32 bits unsigned), position (64 bits
> unsigned)
>  - seconds = position / sample_rate
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2736>
> _______________________________________
>
msg76340 - (view) Author: STINNER Victor (haypo) Date: 2008-11-24 17:13
Ooops, timestamp (c) is the *Mac* timestamp: seconds since the 1st 
january 1904.

> what is your conclusion?

Hum, it's maybe not possible to choose between integer and float. Why 
not supporting both? Example:
 - totimestamp()->int: truncate microseconds
 - totimestamp(microseconds=True)->float: with microseconds

Attached file (timestamp.py) is a module to import/export timestamp in 
all listed timestamp formats. It's written in pure Python.
----------------
>>> import timestamp
>>> from datetime import datetime
>>> now = datetime.now()
>>> now
datetime.datetime(2008, 11, 24, 18, 7, 50, 216762)

>>> timestamp.exportUnix(now)
1227550070
>>> timestamp.exportUnix(now, True)
1227550070.2167621
>>> timestamp.exportMac(now)
3310394870L
>>> timestamp.exportWin64(now)
128720236702167620L
>>> timestamp.exportUUID(now)
134468428702167620L

>>> timestamp.importMac(3310394870)
datetime.datetime(2008, 11, 24, 18, 7, 50)
>>> timestamp.importUnix(1227550070)
datetime.datetime(2008, 11, 24, 18, 7, 50)
>>> timestamp.importUnix(1227550070.2167621)
datetime.datetime(2008, 11, 24, 18, 7, 50, 216762)
----------------

It supports int and float types for import and export.
msg76344 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-24 17:33
On Mon, Nov 24, 2008 at 12:13 PM, STINNER Victor <report@bugs.python.org> wrote:
..
> Hum, it's maybe not possible to choose between integer and float. Why
> not supporting both? Example:
>  - totimestamp()->int: truncate microseconds
>  - totimestamp(microseconds=True)->float: with microseconds

I would still prefer totimestamp()->(int, int) returning (sec, usec)
tuple.  The important benefit is that such totimestamp() will not
loose information and will support more formats than either of your
->int or ->float variants.  The ->int can then be spelt simply as
totimestamp()[0] and on systems with numpy (which is likely for users
that deal with floats a lot), totimestamp(microseconds=True) is simply
dot([1, 1e-6], totimestamp()). (and s,us = totimestamp(); return s +
us * 1e-6 is not that hard either.)
msg76345 - (view) Author: STINNER Victor (haypo) Date: 2008-11-24 17:34
> > Hum, it's maybe not possible to choose between integer and float. Why
> > not supporting both? Example:
> >  - totimestamp()->int: truncate microseconds
> >  - totimestamp(microseconds=True)->float: with microseconds
>
> I would still prefer totimestamp()->(int, int) returning (sec, usec)
> tuple.  The important benefit is that such totimestamp() will not
> loose information

Right, I prefer your solution ;-)
msg76351 - (view) Author: Alexander Belopolsky (belopolsky) Date: 2008-11-24 18:07
On Mon, Nov 24, 2008 at 12:34 PM, STINNER Victor <report@bugs.python.org> wrote:
..
>> I would still prefer totimestamp()->(int, int) returning (sec, usec)
>> tuple.  The important benefit is that such totimestamp() will not
>> loose information
>
> Right, I prefer your solution ;-)
>

Great!  What do you think about extending fromtimestamp(timestamp[,
tz]) and utcfromtimestamp(timestamp) to accept a tuple for the
timestamp?

Also, are you motivated enough to bring this up on python-dev to get a
community and BDFL blessings?  I think this has a chance to be
approved.
msg76352 - (view) Author: David Fraser (davidfraser) Date: 2008-11-24 18:09
----- "STINNER Victor" <report@bugs.python.org> wrote:

> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Timedelta formats:
> 
> (a) Win64: 64 bits unsigned integer, number of 1/10 microsecond
>  - file format: Microsoft Word document (.doc), ASF video (.asf)
> 
> (b) 64 bits float, number of seconds
>  - file format: AMF metadata used in Flash video (.flv)

There are also the PyWinTime objects returned by PythonWin COM calls which are basically FILETIMEs
I don't have time to get the details now but I recently submitted a patch to make them work with milliseconds - see http://sourceforge.net/tracker/index.php?func=detail&aid=2209864&group_id=78018&atid=551954 (yes I know this is a bit off-topic here)
msg77650 - (view) Author: STINNER Victor (haypo) Date: 2008-12-12 01:15
belopolsky will be happy to see this new version of my patch:
 - datetime.totimestamp() => (seconds, microseconds): two integers
 - datetime.totimestamp() implement don't use Python time.mktime() but 
directly the C version of mktime() because time.mktime() creates a 
float value
 - fix time.mktime() to support the timestamp -1 (first second before 
the epoch) to make it consistent with datetime.totimestamp() which 
also support this value
 - fix documentation: it's microseconds (10^-6) and not milliseconds 
(10^-3)
msg77651 - (view) Author: STINNER Victor (haypo) Date: 2008-12-12 01:19
About mktime() -> -1: see the Issue1726687 (I found the fix in this 
issue).

Next job will be to patch datetime.(utc)fromtimestamp() to support 
(int, int). I tried to write such patch but it's not easy because 
fromtimestamp() will support: int, long, float, (int, int), (int, 
long), (long, int) and (long, long). And I don't know if a "long" 
value can be converted to "time_t".
History
Date User Action Args
2009-01-13 23:42:58hayposetfiles: - timestamp.py
2009-01-13 23:42:46hayposetfiles: - datetime_totimestamp-2.patch
2009-01-13 23:42:41hayposetfiles: - datetime_totimestamp.patch
2008-12-12 01:19:58hayposetmessages: + msg77651
2008-12-12 01:15:50hayposetfiles: + datetime_totimestamp-3.patch
messages: + msg77650
2008-11-24 18:09:25davidfrasersetmessages: + msg76352
2008-11-24 18:07:44belopolskysetmessages: + msg76351
2008-11-24 17:34:56hayposetmessages: + msg76345
2008-11-24 17:33:05belopolskysetmessages: + msg76344
2008-11-24 17:13:43hayposetfiles: + timestamp.py
messages: + msg76340
2008-11-24 16:14:29belopolskysetmessages: + msg76332
2008-11-24 16:00:40hayposetmessages: + msg76331
2008-11-24 15:46:11hayposetmessages: + msg76329
2008-11-24 15:17:37belopolskysetmessages: + msg76327
2008-11-24 14:04:48davidfrasersetmessages: + msg76324
2008-11-18 09:05:29andersjmsetnosy: + andersjm
messages: + msg76003
2008-11-15 12:19:37hayposetmessages: + msg75912
2008-11-15 03:17:48belopolskysetmessages: + msg75904
2008-11-15 01:37:40hayposetmessages: + msg75903
2008-11-15 01:15:28belopolskysetnosy: + belopolsky
messages: + msg75902
2008-11-15 00:41:07hayposetfiles: + datetime_totimestamp-2.patch
messages: + msg75900
2008-11-15 00:33:12hayposetfiles: + datetime_totimestamp.patch
messages: + msg75899
2008-11-11 03:12:38hayposetnosy: + haypo
messages: + msg75723
2008-05-11 08:25:30Neil Mullersetnosy: + Neil Muller
messages: + msg66610
2008-05-11 06:12:39tebekasetmessages: + msg66601
2008-05-10 20:48:17davidfrasersetnosy: + davidfraser
2008-05-10 15:54:41hodgestarsetfiles: + add-datetime-totimestamp-method-docs.diff
messages: + msg66539
2008-05-10 14:55:39hodgestarsetfiles: + add-datetime-totimestamp-method.diff
keywords: + patch
messages: + msg66532
nosy: + hodgestar
2008-05-03 02:18:59wernecksetnosy: + werneck
messages: + msg66140
2008-05-01 21:03:25tebekacreate