classification
Title: %Z in strptime doesn't match EST and others
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Alex.LordThorsen, SilentGhost, akira, belopolsky, berker.peksag, inglesp, karlcow, miss-islington, p-ganssle, r.david.murray
Priority: normal Keywords: patch

Created on 2014-09-09 22:24 by cool-RR, last changed 2019-11-26 16:38 by miss-islington.

Files
File name Uploaded Description Edit
draft-strptime-%Z-timezones.diff akira, 2014-09-17 03:54 review
Pull Requests
URL Status Linked Edit
PR 16507 merged python-dev, 2019-10-01 07:43
Messages (18)
msg226668 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-09 22:24
The documentation for %Z ( https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior ) says it matches `EST` among others, but in practice it doesn't: 

    Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AMD64)] on win32
    Type "copyright", "credits" or "license()" for more information.
    DreamPie 1.2.1
    >>> import datetime
    >>> datetime.datetime.strptime('2016-12-04 08:00:00 UTC', '%Y-%m-%d %H:%M:%S %Z')
    0: datetime.datetime(2016, 12, 4, 8, 0)
    >>> datetime.datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
    Traceback (most recent call last):
      File "<pyshell#2>", line 1, in <module>
        datetime.datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
      File "C:\Python34\lib\_strptime.py", line 500, in _strptime_datetime
        tt, fraction = _strptime(data_string, format)
      File "C:\Python34\lib\_strptime.py", line 337, in _strptime
        (data_string, format))
    ValueError: time data '2016-12-04 08:00:00 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
    >>>
msg226670 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-10 00:57
Looking at the code, the only timezone strings it recognizes are utc, gmt, and whatever is in time.tzname (EST and EDT, in my case).

This seems...barely useful, although clearly not useless :)

And does not seem to be documented.
msg226857 - (view) Author: Akira Li (akira) * Date: 2014-09-13 23:35
if PEP 431 is implemented (or anything that gives access to zoneinfo)
then strptime could extend the list of timezones it accepts (utc + 
local timezone names) to include names from the tz database:

  import pytz # $ pip install pytz

  {tzname for tz in map(pytz.timezone, pytz.all_timezones) 
   for _, _, tzname in getattr(tz, '_transition_info', [])}

It includes EST.
msg226858 - (view) Author: Akira Li (akira) * Date: 2014-09-13 23:40
Without %z (utc offset) strptime returns a naive datetime object that
is interpreted as utc or local time usually.

It might explain why %Z tries to match only utc and the local timezone
names.
msg226922 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-15 15:36
I think its existing behavior is because it doesn't have access to a list of recognized timezones.  As you say, this could be fixed by PEP 431.  It could also be fixed by adopting the "email standard" timezones (see email/_parseaddr.py), which is a def-facto standard.

Regardless of whether something is done to expand the number of timezone it knows about, though, there's a current doc bug that should be fixed.  If someone wants to advocate for expanding the timezone list, that should be a separate issue.
msg226971 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2014-09-17 01:21
> if PEP 431 is implemented (or anything that gives access to zoneinfo)
> then strptime could extend the list of timezones it accepts (utc + 
> local timezone names) to include names from the tz database:

FTR, I have a WIP(and probably a bit outdated) branch to implement PEP 431 on GitHub:

    https://github.com/berkerpeksag/cpython/tree/pep431
msg226976 - (view) Author: Akira Li (akira) * Date: 2014-09-17 03:54
If the current implementation is considered correct (%Z not recognizing
EST) then indeed extending the list of recognized timezones is another
issue. And the docs should be changed to match the implementation.

The current behavior is broken, see also issue22426

If we assume that the docs are correct (%Z should match EST) even if it
is not implemented yet then it is this issue's responsibility to extend
the list of recognized timezones (even an incomplete hard-coded list
generated by the code from msg226857 would be fine).

Lib/email/_parseaddr.py approach (tzname corresponds to a fixed utc
offset) is wrong: tzname may correspond to multiple utc offsets at the
same time (different timezones) and at different times (even within the
same timezone). Having the tz database won't fix it: *tzname along is
not enough to determine UTC offset in _many_ cases.*


CST is ambiguous if %z is not given therefore even if strptime() had the
access to a larger list of recognized timezones; it is not clear what
the return value would be:

- aware datetime: which timezone to use?

- naive datetime: it might be misleading if the input timezone name
  doesn't correspond to utc or the local timezone

email._parseaddr._timezones is misleading if used globally: CST is also
used in Australia, China with different utc offsets.

One of possible solutions is to return aware datetime objects if a new
truthy *timezones* keyword-only argument is provided. It may contain a
mapping to disambiguate timezone abbreviations: {'EST': timedelta|tzinfo}.

If *timezones* is False then strptime() returns a naive datetime
object. The only difference from the current behavior is that a larger
list of timezones is supported to match the docs.

With bool(timezones) == True, strptime() could return an aware datetime
object in utc, local timezones, or any timezone in *timezones* if it is
a mapping.

Default *timezones* is None that means timezone=False for backward
compatibility. DeprecationWarning is added that timezone=True will be
default in the next release 3.6 if no objections are received
until then e.g.,

    if tzname and timezones is None: # %Z matches non-empty string
        warn("Default *timezones* parameter for "
             "%s.strptime() will be True in Python 3.6. "
             "Pass timezones=False to preserve the old behaviour" % (
                 cls.__qualname__,),
             category=DeprecationWarning, stacklevel=2)

I've uploaded the patch draft-strptime-%Z-timezones.diff that implements
this solution. It also contains tests and docs updates.
msg226983 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-17 07:12
I don't think we are going to support a timezone list like that without PEP 431.

You should attach your patch to a new issue.  When I said this should the doc issue, that is because only a doc fix is acceptable for 3.4.  Adding more timezones to recognize would be an enhancement, given the complexity of the proposed solution.

On the other hand, if timezone names are ambiguous, I'm not sure there *is* a fix other than using the "defacto standard" names and offsets used by the email library.  Actually, isn't there a written standard that addresses this issue?  I seem to remember reading a discussion of the problem somewhere...
msg227141 - (view) Author: Akira Li (akira) * Date: 2014-09-20 02:56
> I don't think we are going to support a timezone list like that without PEP 431.

PEP 431 won't fix this issue. See below.

> You should attach your patch to a new issue.  When I said this should
> the doc issue, that is because only a doc fix is acceptable for 3.4.
> Adding more timezones to recognize would be an enhancement, given the
> complexity of the proposed solution.

The docs are correct (they imply that %Z should accept EST). It is the
implementation that is deficient.

The patch introduces a new parameter therefore I agree: it should be
applied only in 3.5+

> On the other hand, if timezone names are ambiguous, I'm not sure there
> *is* a fix other than using the "defacto standard" names and offsets
> used by the email library.  Actually, isn't there a written standard
> that addresses this issue?  I seem to remember reading a discussion of
> the problem somewhere...

Multi-timezone programming

email._parseaddr._timezones with CST=-600 is like US-ASCII (the
standard). 

Code that uses local timezone is bilingual (locale-based): CST=-600 in
Chicago but it is CST=+800 in China and it may be something else in
other parts of the world. The *timezones* parameter in my patch allows
to specify the encoding different from the current locale.

Code that uses the tz database is multilingual (Unicode): knowing the
encoding (zoneinfo name and the time) it is possible to decode almost
all encoded characters (to find out whether the timezone abbreviation is
valid with a given time and to find the correct UTC offset).

If you don't know the encoding then the support for Unicode (the
presence of the tz database (PEP 431)) along won't allow you to decode a
byte sequence (time string). You need an encoding (timezone name, time)
to interpret the input correctly.

Given that the list is used to accept a string as a timezone
abbreviation, I don't think it should depend on PEP 431 e.g., old date
strings/people may use WST even if the new pytz timezones do not use it.

The initial list could be seeded from using pytz as in my patch and then
expanded as necessary by hand (there is no official entity that tracks
timezone abbreviations).
msg265742 - (view) Author: Peter Inglesby (inglesp) * Date: 2016-05-17 01:14
Given the difference between the documented and the actual behaviours, and given that it's apparently not obvious what the correct fix should be, would a patch that updates the docs (to say that %Z only matched GMT and UTC) be welcome?
msg265768 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-05-17 16:07
Peter: yes, that is what I've been saying this issue is for :)  Anything else is a new issue.

Note that it *does* also recognize the strings in time.tzname in addition to UTC and GMT.
msg328842 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2018-10-29 17:13
I think strptime should only accept %Z when it comes together with %z and not do any validation.

This is close to the current behavior.  %Z by itself is useless because even when it is accepted, the value is discarded:

>>> print(datetime.strptime('UTC', '%Z'))
1900-01-01 00:00:00

You have to use %z to get an aware datetime instance: 

>>> print(datetime.strptime('UTC+0000', '%Z%z'))
1900-01-01 00:00:00+00:00


The validation is already fairly lax:

>>> print(datetime.strptime('UTC+1234', '%Z%z'))
1900-01-01 00:00:00+12:34

I don't think this issue has anything to do with the availability of zoneinfo database.  Timezone abbreviations are often ambiguous and should only serve as a human-readable supplement to the UTC offset and cannot by itself be used as a TZ specification.
msg339672 - (view) Author: Alex LordThorsen (Alex.LordThorsen) * Date: 2019-04-08 20:25
This behavior is currently unchanged and the docs still state that `EST` is an acceptable value.

```
>>> datetime.strptime("2019-01-28 18:54:45 EST", "%Y-%m-%d %H:%M:%S %Z")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.7/_strptime.py", line 359, in _strptime
    (data_string, format))
ValueError: time data '2019-01-28 18:54:45 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
```
msg339755 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-04-09 14:29
@Alex LordThorsen: It will accept EST if EST is one of your "local" time zones, so whatever's in `time.tzname`.

In the short term, I think the right thing to do would be to update the documentation to remove the reference to "EST", and add an explanatory note in the section about %Z that explains that it accepts a few hard-coded values + whatever's in `time.tzname`.

In the long run, I think the best "out of the box" support we can provide would be supporting %Z when %z is present (per Alexander's suggestion), and possibly something akin to `dateutil`'s "tzinfos", where a mapping between abbreviations and `tzinfo` objects could be passed to `strptime` explicitly.
msg339912 - (view) Author: Alex LordThorsen (Alex.LordThorsen) * Date: 2019-04-11 01:13
It's been a while since I've committed a patch. Do I still upload a diff file here or should I open a PR for the doc changes on github?
msg339977 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2019-04-11 14:05
PR on github, Alex
msg353639 - (view) Author: karl (karlcow) * Date: 2019-10-01 07:46
I created a PR following the recommendations of p-ganssle
https://github.com/python/cpython/pull/16507
msg357509 - (view) Author: miss-islington (miss-islington) Date: 2019-11-26 16:38
New changeset bc441ed7c1449f06df37905ee6289aa93b85d4cb by Miss Islington (bot) (Karl Dubost) in branch 'master':
bpo-22377: Fixes documentation for %Z in datetime (GH-16507)
https://github.com/python/cpython/commit/bc441ed7c1449f06df37905ee6289aa93b85d4cb
History
Date User Action Args
2019-11-26 16:38:48miss-islingtonsetnosy: + miss-islington
messages: + msg357509
2019-10-01 07:46:08karlcowsetnosy: + karlcow
messages: + msg353639
2019-10-01 07:43:53python-devsetstage: needs patch -> patch review
pull_requests: + pull_request16096
2019-09-12 14:18:36p-gansslelinkissue38139 superseder
2019-09-12 14:14:48p-gansslesetstage: needs patch
versions: + Python 3.7, Python 3.8, Python 3.9, - Python 3.5, Python 3.6
2019-04-11 14:05:12SilentGhostsetnosy: + SilentGhost
messages: + msg339977
2019-04-11 04:17:59cool-RRsetnosy: - cool-RR
2019-04-11 01:13:03Alex.LordThorsensetmessages: + msg339912
2019-04-09 14:29:01p-gansslesetmessages: + msg339755
2019-04-08 20:25:40Alex.LordThorsensetnosy: + Alex.LordThorsen
messages: + msg339672
2018-10-29 17:24:53belopolskylinkissue33940 superseder
2018-10-29 17:13:09belopolskysetmessages: + msg328842
2018-07-05 15:59:27p-gansslesetnosy: + p-ganssle
2016-05-17 16:08:08r.david.murraysetversions: + Python 3.5, Python 3.6, - Python 3.4
2016-05-17 16:07:56r.david.murraysetmessages: + msg265768
2016-05-17 01:14:28inglespsetnosy: + inglesp
messages: + msg265742
2014-09-20 02:56:29akirasetmessages: + msg227141
2014-09-17 07:12:48r.david.murraysetmessages: + msg226983
2014-09-17 03:54:51akirasetfiles: + draft-strptime-%Z-timezones.diff
keywords: + patch
messages: + msg226976
2014-09-17 01:21:57berker.peksagsetnosy: + berker.peksag
messages: + msg226971
2014-09-15 15:36:49r.david.murraysetmessages: + msg226922
2014-09-13 23:40:06akirasetmessages: + msg226858
2014-09-13 23:35:15akirasetnosy: + akira
messages: + msg226857
2014-09-10 00:57:50r.david.murraysetnosy: + r.david.murray, belopolsky
messages: + msg226670
2014-09-09 22:24:15cool-RRcreate