classification
Title: Combined behavior of datetime.datetime.timestamp() and datetime.datetime.utcnow() on non-UTC timezoned machines
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Using datetime.datetime.utcnow().timestamp() in Python3.6.0 can't get correct UTC timestamp.
View: 33293
Assigned To: Nosy List: Yi Luan, belopolsky, p-ganssle
Priority: normal Keywords:

Created on 2020-03-15 15:01 by Yi Luan, last changed 2020-03-17 09:12 by Yi Luan. This issue is now closed.

Messages (8)
msg364240 - (view) Author: Yi Luan (Yi Luan) Date: 2020-03-15 15:01
Hello,

Apologies if this was a duplicate issue.

I guess the most concise way of saying this is that when doing:

>>> datetime.datetime.utcnow().timestamp()

on a machine whose local time isn't the UTC time, the above code will not return the correct timestamp.

Because datetime.datetime.timestamp() and datetime.datetime.fromtimestamp() will intrinsically convert the timestamp based on the local time of the running machine, when fed with data that are already converted to UTC, these functions will double convert them hence will return incorrect result.

For example:
On a machine that is in CST time:
>>> dt = datetime.datetime.utcnow()
>>> dt
datetime.datetime(2020, 3, 15, 14, 33, 10, 213664)
>>> datetime.datetime.fromtimestamp(dt.timestamp(), datetime.timezone.utc)
datetime.datetime(2020, 3, 15, 6, 33, 10, 213664)

Meanwhile, on a machine that is in UTC time:
>>> dt = datetime.datetime.utcnow()
>>> dt
datetime.datetime(2020, 3, 15, 14, 41, 2, 203275)
>>> datetime.datetime.fromtimestamp(dt.timestamp(), datetime.timezone.utc)
datetime.datetime(2020, 3, 15, 14, 41, 2, 203275)

I understand that one should probably use datetime.datetime.fromtimestamp() to construct time, but the output of the above code is inconsistent on machines that are set to different timezones. The above code explicitly asked to get the UTC time now and get the timestamp, then convert from a UTC timestamp to a datetime object. The result should be the same on the first machine but it didn't.

From my point of view, timestamp() functions should not shift any datetime objects since it returns an object that is naive about the tzinfo anyway. Timestamp data generated by Python should be correct and code should do what the programmer asked the code to do. In the above example, datetime.datetime.utcnow().timestamp() should return the timestamp of now in UTC time but in fact on a machine in CST time it would return the timestamp 8 hours before the UTC timestamp of now.

The intrinsic behavior of timestamp() functions will cause ambiguity in code, therefore I suggest that timestamp() functions, unless used on tz aware objects, should not shift any date time based on the running machine's local time.
msg364243 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-03-15 16:04
This is the intended behavior of these functions, and there is actually now a warning on both the utcnow and utcfromtimestamp functionsto reflect this:

https://docs.python.org/3/library/datetime.html#datetime.datetime.utcnow

I would say that the correct answer here is to stop using utcnow and utcfromtimestamp (except possibly in very limited circumstance), I have written about it here:

https://blog.ganssle.io/articles/2019/11/utcnow.html

The preferred way to do this is `datetime.now(tzinfo=datetime.timezone.utc)` or `datetime.fromtimestamp(ts, tzinfo=datetime.timezone.utc)`.

The main thing to internalize is that the result of `.timestamp()` always has a time zone, because it is an epoch time, meaning that it is the number of seconds in UTC since 1970-01-01T00:00:00Z.

In Python 2, any operations on naive datetimes that required them to represent absolute times were an error, but in Python 3 that was changed and they were treated as local times. Perhaps leaving that behavior as is and having a dedicated "local time" object would have been a good idea, but there are actually some serious problems with doing it that way because it's difficult to define "local time" in such a way that it may not change over the course of an interpreter lifetime, which would cause major issues for an aware datetime (guaranteed not to change over the course of the interpreter lifetime). Treating naive times as local for operations that require localization (without changing their equality and comparison semantics, which is what would cause the problems) is a neat solution to that.

Sorry this causes confusion, perhaps in the future we can look into removing the `.utcnow()` and `.utcfromtimestamp()` functions, or renaming them to something else.

I'm going to set the status of this as wontfix because this is an intended behavior, but feel free to continue to use the ticket for discussion.

Thank you for taking the time to file an issue!
msg364244 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2020-03-15 16:07
I am sure this has been reported before – I will try to find the relevant issue.  This behavior is correct and documented.  The only improvement that we can consider is to make it more explicit that utcnow is deprecated and the correct way to obtain the UTC timestamp is

datetime.now(timezone.utc).timestamp()
msg364245 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2020-03-15 16:09
This is a duplicate of issue 33293.
msg364282 - (view) Author: Yi Luan (Yi Luan) Date: 2020-03-16 04:04
Hi,

Thanks for taking time to reply my question.

I suppose the title I put was a bit confusing.

And the recommended way to generate time in UTC does solve this issue, in Python.

However the message I was trying to convey is, the behavior of timestamp() is one action too much from my point of view.

Say, for example, if I'm sending data generated on my local machine (in CST time) to someone else, and for easy comparison and precision I used `datetime.datetime.now().timestamp()` as the value of time fields. I would naturally think that the timestamp() function returned the timestamp of my local time. And if the person who received the data, instead of using Python, used Node.js to import those timestamps, and told by me that the timestamps were in CST time. Then he/she will get the wrong time since those timestamps were actually in UTC.
As Scott Mayer, the author of Effective C++, once said, "APIs should be easy to use correctly and hard to use incorrectly", timestamp() functions from my point of view did one thing too much, it shouldn't shift any datetime object that it was fed into.
And in the documentation of datetime.timestamp(), there is no warning about this behavior, only a note on getting the UTC time. It only says "Return POSIX timestamp corresponding to the datetime instance". From my understanding POSIX timestamp is the time elapsed since epoch, not time elapsed -8 hours on CST timed machines since epoch.

By it's nature, timestamps couldn't and shouldn't incorporate any timezone information, when only presented with a naive datetime, program should only convert it to whatever datetime that timestamp represents, rather than thinking that the machine is in some timezone and shifts it by some time and then converts it.

Sorry if what I wrote does not make sense, the reason for me to use utcnow() is just to demonstrate, from my point of view utcnow() or utcfromtimestamp() did nothing wrong, but the problem lies with the timestamp() and fromtimestamp() functions' extra behavior.

Another example is, say, the current unix epoch is 1584330809, when I fed it into Python on my local machine, I get:
>>> datetime.datetime.fromtimestamp(1584330809)
datetime.datetime(2020, 3, 16, 11, 53, 29)

when I fed it into Node.js, I get:
> new Date(1584330809*1000)
2020-03-16T03:53:29.000Z

From my point of view, the Node's behavior is much more natural and intuitive.

since we didn't know what timezone 1584330809 is in, the returned datetime should be just how many seconds elapsed since epoch, not my local time of the machine.
msg364283 - (view) Author: Yi Luan (Yi Luan) Date: 2020-03-16 04:15
Sorry to make changes again but I typed his name wrong = =!
It's Scott Meyers. Apologies.
msg364323 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-03-16 14:05
@Yi Luan

I think you may misunderstand what the `.timestamp()` function does - it returns an epoch time, which is the amount of time (in seconds) elapsed since the Unix epoch: https://en.wikipedia.org/wiki/Unix_time

The number is not different depending on your time zone:

    >>> from datetime import *
    >>> from dateutil import tz

    >>> dt = datetime(2019, 1, 1, tzinfo=timezone.utc)
    >>> print(f"{dt}: {dt.timestamp()}")
    2019-01-01 00:00:00+00:00: 1546300800.0

    >>> dt = dt.astimezone(tz.gettz("America/New_York"))
    >>> print(f"{dt}: {dt.timestamp()}")
    2018-12-31 19:00:00-05:00: 1546300800.0

    >>> dt = dt.astimezone(tz.gettz("Asia/Tokyo"))
    >>> print(f"{dt}: {dt.timestamp()}")
    2019-01-01 09:00:00+09:00: 1546300800.0

Note how the timestamp number is always the same.

Alexander's suggestion of using `datetime.now(tz=timezone.utc).timestamp()` is slightly misleading because `datetime.now().timestamp()` and `datetime.now(tz=timezone.utc).timestamp()` will always return the same value. I think he was just using that as shorthand for "replace datetime.utcnow() with datetime.now(tz=timezone.utc) in all cases".

When you have a naive datetime (with no tzinfo), the only options are to pick the time zone it represents and convert to UTC or to throw an error and say, "We don't know what time zone this represents, so we cannot do this operation." Python 2 used to throw an exception, but in Python 3 naive datetimes represent local times.

If you want "nominal number of seconds since 1970-01-01T00:00:00 *in this time zone*", you want something more like this:

  def seconds_since(dt, epoch=datetime(1970, 1, 1)):
    return (dt.replace(tzinfo=None) - epoch).total_seconds()

That does not take into account total elapsed time from DST transitions and the like - to do that, you'll want something more like this:

  def seconds_elapsed_since(dt, epoch=datetime(1970, 1, 1)):
    if epoch.tzinfo is None and dt.tzinfo is not None:
        epoch = epoch.replace(tzinfo=dt.tzinfo)
    return (dt - epoch).total_seconds()

I urge you not to do this in any sort of interop protocol, though, because integer timestamps are traditionally interpreted as Unix times, and if you start passing around an integer timestamp that represents "unix time plus or minus a few hours", you are likely to create bugs when someone mistakes it for a unix time.
msg364398 - (view) Author: Yi Luan (Yi Luan) Date: 2020-03-17 09:12
Hi Paul,

Yes, I totally agree with you, and I should follow your advice and not to pass timestamps as representations of arbitrary datetime for interop usage. However in my particular case, I'm not the person who can make such type of decisions.

Perhaps I'm very picky here but I think it would be more natural for  `timestamp()` types of functions to return the arbitrary timestamp value(or vise versa) regardless of which timezone I'm in.....When interop and international operations were involved, Python's behavior might just add another level of conversion into the process and create more unnecessary confusion.

I know I should always be aware of the timezone my machine is in and adding a tzinfo in the code for cross timezone operations in Python. But a simple code like:
>>> datetime.datetime(2019, 1, 1, 0, 0, 0).timestamp()

would generate different result on different machines across different timezone is, from my point of view, a confusing behavior, because, I can only be sure that I am the cautious one and putting tzinfo in, however, if there were multiple person in different timezone that didn't put tzinfo in and using different programming tools, the potential results of programs would be quite confusing (timestamp generated by python fed into other programming tools, or vise versa), but I understand that it is not a sound practice in the first place.

Also, from my point of view, when not presented with tzinfo, the machine should not guess what timezone the programmer want this datetime.datetime(2019, 1, 1, 0, 0, 0) to be in, not to think that because my machine is in such such timezone so the timestamp() generated from datetime.datetime(2019, 1, 1, 0, 0, 0) would be 2019/1/1 00:00:00 minus or plus some hours.

As you've mentioned:
<quote>
When you have a naive datetime (with no tzinfo), the only options are to pick the time zone it represents and convert to UTC or to throw an error and say, "We don't know what time zone this represents, so we cannot do this operation." Python 2 used to throw an exception, but in Python 3 naive datetimes represent local times.
</quote>

I think the appropriate option is to not pick any time zone at all, just viewing it as an "UTC"(return nominal value of timestamp()) other than "convert"-ing it to UTC(shifting by N hours based on the running machine's timezone), since we don't have any knowledge on which timezone this datetime object represents, we can't know if we have converted it truly to UTC or not, and to me, why bother shifting it by some hours and timestamp() it anyway, that's just another layer of calculation that could go wrong.

Anyways, the above is just my personal opinion, obviously there is definitely nothing wrong with viewing an arbitrary datetime as a local time.
Since you've mentioned that this behavior is intended, I'd assume and understand that this behavior is a result of balancing a lot of other choices.
But from my point of view, as a clueless user, it might give me some confusion as to what the actual timestamp() I'm generating, or what the actual datetime I've imported when presented with an arbitrary timestamp. And generating timestamp on naive datetime objects regardless of what timezone the machine is in seems to be a more straight forward and clear thing to do.

Again thanks very much for baring with me with this discussion.
History
Date User Action Args
2020-03-17 09:12:56Yi Luansetmessages: + msg364398
2020-03-16 14:05:08p-gansslesetmessages: + msg364323
2020-03-16 04:15:14Yi Luansetmessages: + msg364283
2020-03-16 04:04:11Yi Luansetmessages: + msg364282
2020-03-15 16:10:19belopolskysetresolution: wont fix -> duplicate
2020-03-15 16:09:08belopolskysetsuperseder: Using datetime.datetime.utcnow().timestamp() in Python3.6.0 can't get correct UTC timestamp.
messages: + msg364245
2020-03-15 16:07:09belopolskysetmessages: + msg364244
2020-03-15 16:04:37p-gansslesetstatus: open -> closed
resolution: wont fix
messages: + msg364243

stage: resolved
2020-03-15 15:08:26xtreaksetnosy: + belopolsky, p-ganssle
2020-03-15 15:01:14Yi Luancreate