classification
Title: mention explicitly that stdlib assumes gmtime(0) epoch is 1970
Type: behavior Stage:
Components: Documentation Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: akira, belopolsky, cvrebert, docs@python
Priority: normal Keywords: patch

Created on 2014-09-07 17:20 by akira, last changed 2014-12-02 08:02 by akira.

Files
File name Uploaded Description Edit
docs-time-epoch_is_1970.diff akira, 2014-09-07 17:20 review
Messages (9)
msg226539 - (view) Author: Akira Li (akira) * Date: 2014-09-07 17:20
See discussion on Python-ideas
https://mail.python.org/pipermail/python-ideas/2014-September/029228.html
msg231954 - (view) Author: Chris Rebert (cvrebert) * Date: 2014-12-01 20:46
Ping. This small patch has been waiting nearly 3 months for a review.
msg231955 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-01 21:32
I don't like the proposed note.

1. It is not the job of the time module documentation to warn about "many functions in the stdlib."  What are these functions, BTW?

2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

I think an improvement would be to spell Epoch with a capital E and define it as "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC)."  See <http://pubs.opengroup.org/onlinepubs/9699919799>.
msg231957 - (view) Author: Akira Li (akira) * Date: 2014-12-01 22:36
> Alexander Belopolsky added the comment:
>
> 1. It is not the job of the time module documentation to warn about
> "many functions in the stdlib."  What are these functions, BTW?

The e-mail linked in the first message of this issue msg226539
enumerates some of the functions:

  https://mail.python.org/pipermail/python-ideas/2014-September/029228.html

> 2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

It is the language used by C standard for time() function:

  The time function determines the current calendar time. The encoding
  of the value is unspecified.

> I think an improvement would be to spell Epoch with a capital E and
> define it as "The time zero hours, zero minutes, zero seconds, on
> January 1, 1970 Coordinated Universal Time (UTC)."  See
> <http://pubs.opengroup.org/onlinepubs/9699919799>.
>

The word *epoch* (lowercase) is used by C standard.

It is not enough to say that time module uses POSIX epoch (Epoch) e.g.,
a machine may use "right" zoneinfo (the same epoch year 1970) but the
timestamp for the same UTC time are different by number of leap seconds
(10+25 since 2012).

POSIX encoding implies that the formula works:

  utc_time = datetime(1970, 1,  1) + timedelta(seconds=posix_timestamp)

if time.time() doesn't return posix_timestamp than "many functions in
the stdlib" will break.

It is possible to inspect all stdlib functions that use time module and
determine for some of them whether they will break if gmtime(0) is not
1970 or "right" zoneinfo is used or any non-POSIX time encoding is
used. But it is hard to maintain such a list because any future code
change may affect the behavior. I prefer a vague statement ("many
functions") over a possible lie (the documentation shouldn't make
promises that the implementation can't keep).

POSIX language is (intentionally) vague and avoids SI seconds vs. UT1
(mean solar) seconds distinction. I don't consider systems where
"seconds" doesn't mean SI seconds used by UTC time scale.
msg231964 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-01 23:18
In the context of Python library documentation, the word "encoding" strongly suggests that you are dealing with string/bytes.  The situation may be different in C. If you want to refer to something that is defined by the POSIX standard you should use the words that can actually be found in that standard.  

When I search for "encoding" at <http://pubs.opengroup.org/onlinepubs/9699919799/>, I get

crypt - string encoding function (CRYPT) 
encrypt - encoding function (CRYPT) 
setkey - set encoding key (CRYPT)

and nothing related to time.
msg231968 - (view) Author: Akira Li (akira) * Date: 2014-12-01 23:33
> Alexander Belopolsky added the comment:
>
> In the context of Python library documentation, the word "encoding"
> strongly suggests that you are dealing with string/bytes.  The
> situation may be different in C. If you want to refer to something
> that is defined by the POSIX standard you should use the words that
> can actually be found in that standard.
>
> When I search for "encoding" at <http://pubs.opengroup.org/onlinepubs/9699919799/>, I get
>
> crypt - string encoding function (CRYPT) 
> encrypt - encoding function (CRYPT) 
> setkey - set encoding key (CRYPT)
>
> and nothing related to time.
>

I've provide the direct quote from *C* standard in my previous message msg231957:

  > 2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

  It is the language used by C standard for time() function:

    The time function determines the current calendar time. The encoding
    of the value is unspecified.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <- from the C standard

notice the word *encoding* in the quote.
msg231969 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-02 00:07
> It is possible to inspect all stdlib functions that use time module and
> determine for some of them whether they will break if gmtime(0) is not
> 1970 or "right" zoneinfo is used or any non-POSIX time encoding is
> used. But it is hard to maintain such a list because any future code
> change may affect the behavior.

Let's not confuse the issue of gmtime(0) not being 1970-01-01T00 and localtime() expecting non-POSIX time_t.  Since gmtime(0) is the same on all platforms supported by Python, it is a fair game to rely on this fact in Python code.

The issue of "right" zoneinfo is different: at least two major Python platforms (OS X and Linux) can be configured in a non-POSIX way.  The decision not to support these configurations in the datetime module was deliberate, but some partial support can be added.  For example, datetime.astimezone() cannot work correctly in the "right" timezone because datetime.second cannot be 60, but if it returns values that are off by some 20 seconds in other times, I would call it a bug, but many will disagree.

I don't know how popular configurations with right timezones are, but testing Python stdlib in those configurations can only help the overall stdlib quality.
(Unfortunately, at the moment we have have very few tests even for the mainstream timezones such as Europe/Moscow.)
msg231971 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-02 00:38
> I've provide the direct quote from *C* standard ...

I understand that C standard uses the word "encoding", but it does so for a reason that is completely unrelated to the choice of epoch.  "Encoding" is how the bytes in memory should be interpreted as "number of seconds" or some other notion of time.  For, example "two's complement little-endian 32-bit signed int" is an example of valid time_t encoding, another example would be IEEE 754 big-endian 64-bit double.  Note that these choices are valid for both C and POSIX standards.

If you google for your phrase "time in POSIX encoding", this issue is the only hit.  This strongly suggests that your choice of words is not the most natural.
msg231979 - (view) Author: Akira Li (akira) * Date: 2014-12-02 08:02
> Alexander Belopolsky added the comment:
>
>> I've provide the direct quote from *C* standard ...
>
> I understand that C standard uses the word "encoding", but it does so
> for a reason that is completely unrelated to the choice of epoch.
> "Encoding" is how the bytes in memory should be interpreted as "number
> of seconds" or some other notion of time.  For, example "two's
> complement little-endian 32-bit signed int" is an example of valid
> time_t encoding, another example would be IEEE 754 big-endian 64-bit
> double.  Note that these choices are valid for both C and POSIX
> standards.

I agree one *part* of "encoding" is how time_t is *represented* in
memory but it is not the only part e.g.:

  The mktime function converts the broken-down time, expressed as local
  time, in the structure pointed to by timeptr into a calendar time
  value with the same encoding as that of the values returned by the
  time function.

notice: "the same encoding as ... returned by the time function".

time() function can return values with different epoch (implementation
defined). mktime() is specified to use the *same* encoding i.e., the
same epoch, etc.

i.e., [in simple words] we have calendar time (Gregorian date, time) and
we can convert it to a number (e.g., Python integer), we can call that
number "seconds" and we can represent that number as some (unspecified)
bit-pattern in C.

I consider the whole process of converting "time" to a bit-pattern in
memory as "encoding" i.e., "32/64, un/signed int/754 double" is just
*part* of it e.g.,

1. specify that 1970-01-01T00:00:00Z is zero (0)
2. specify 0 has time_t type
3. specify how time_t type is represented in memory.

I may be wrong that C standard includes the first item in time
"encoding".

> If you google for your phrase "time in POSIX encoding", this issue is
> the only hit.  This strongly suggests that your choice of words is not
> the most natural.

I've googled the phrase (no surrounding quotes) and the links talk about
time encoded as POSIX time [1] and some *literally* contain the phrase
*POSIX encoding* [2] because *Python* documentation for calendar.timegm
contains it [3]:

  [timegm] returns the corresponding Unix timestamp value, assuming an
  epoch of 1970, and the POSIX encoding. In fact, time.gmtime() and
  timegm() are each others’ inverse.

In an effort to avoid personal influence, I've repeated the expreriment
using Tor browser and other search engines -- the result is the same.

timegm() documentation might be the reason why I've used the phrase.

I agree "POSIX encoding" might be unclear. The patch could be replaced
by any phrase that expresses that some functions in stdlib assume that
time.time() returns (+/- fractional part) "seconds since the Epoch" as
defined by POSIX [4].

[1] http://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_number
[2] http://ruslanspivak.com/2011/07/20/how-to-convert-python-utc-datetime-object-to-unix-timestamp/
[3] https://docs.python.org/3/library/calendar.html#calendar.timegm
[4]
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_15
History
Date User Action Args
2014-12-02 08:02:14akirasetmessages: + msg231979
2014-12-02 00:38:57belopolskysetmessages: + msg231971
2014-12-02 00:07:12belopolskysetmessages: + msg231969
2014-12-01 23:33:35akirasetmessages: + msg231968
2014-12-01 23:18:03belopolskysetmessages: + msg231964
2014-12-01 22:36:03akirasetmessages: + msg231957
2014-12-01 21:32:52belopolskysetmessages: + msg231955
2014-12-01 21:02:13ned.deilysetnosy: + belopolsky
2014-12-01 20:46:34cvrebertsetmessages: + msg231954
2014-09-12 16:54:24cvrebertsetnosy: + cvrebert
2014-09-07 17:20:44akiracreate