This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: datetime.strptime creates tz naive object from value containing a tzname
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: akeeman, belopolsky, p-ganssle
Priority: normal Keywords: patch

Created on 2018-01-05 12:24 by akeeman, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 5106 closed akeeman, 2018-01-05 12:38
Messages (4)
msg309502 - (view) Author: Arjan Keeman (akeeman) * Date: 2018-01-05 12:24
Consider the following:

tz_naive_object = datetime.strptime("2018-01-05 13:10:00 CET", "%Y-%m-%d %H:%M:%S %Z")

Python's standard library is not capable of converting the timezone name CET to a tzinfo object. Therefore the case made above returns a timezone naive datetime object.

I propose to add an extra optional argument to _strptime.py's _strptime_datetime function, and to datetime.strptime: tzname_to_tzinfo:Optional[Callable[[str],Optional[tzinfo]]]=None. This parameter can be set with a function that accepts the timezone name and returns a tzinfo object or None (like pytz.timezone). None will mean that a timezone naive object will be created.

Usage:
tz_aware_object = datetime.strptime("2018-01-05 13:10:00 CET", "%Y-%m-%d %H:%M:%S %Z", pytz.timezone)
msg309509 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-01-05 16:55
This is essentially what the `tzinfos` argument to `dateutil.parser.parse` does. I do think something *like* this is the only reasonable way to handle %Z->tzinfo mappings.

In `dateutil` (https://dateutil.readthedocs.io/en/latest/parser.html#dateutil.parser.parse), you can either pass a mapping or callable. Most of the problems we have in dateutil relate to the fact that we're both inferring what should or should not be interpreted as a time zone *and* passing it to the mapping or callable. Given that the first problem is solved by the format specifier already having an option for %Z, the implementation of this would be much easier.

I think the options for how this could be implemented are:

1. Mapping only
2. Callable only
3. Mapping or callable

Callable-only will probably lead to plenty of problems, since there's *already* a problem in this bug report, which is that `pytz.timezone` evidently doesn't do what Arjan thinks it does, because that function only *happens* to work. It would not work with, say, `CST` or `PST`. That said, callable is the most versatile way to do it, and if we don't include it, then people will probably end up having to work around it by creating mappings whose `.get` calls arbitrary functions.

#1 is probably the least convenient and #3 is the most convenient. Either way, I'd say that the primary documented interface should be mappings, since that's least error-prone (these mappings could be curated by third party libraries for a given local context). An advantage of using mappings is that if we ever have a C implementation of strptime, it can have a fast evalution path for when the mapping is a `Dict`.
msg309511 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-01-05 17:12
By the way, one possibly significant problem with this interface is that it would tend to encourage the use of static timezone offsets rather than rule sets as intended by `tzinfo`. The main problem is that a simple mapping between tzname and tzinfo (whether done with a Mapping or a callable) will actually lose information about the fold that is encoded in the chosen tzname.

In dateutil, I solved this problem by attaching the timezone object and checking whether the `.tzname()` of the created datetime matches the string it was parsed from, and if not, set fold=1 and check again - if that one matches, use fold=1, otherwise just return it with fold=0. This is obviously a heuristic metric that will not always work.

Two possible more general solutions to this problem:

1. have a variant of `strptime` that returns a `datetime` and the contents of `%Z` and let users or third party libraries handle converting the string into a timezone and attaching it to the datetime.
2. have `tzinfos` take a callable like `handle_tzinfo(dt, tzstr)` which returns the localized datetime.
3. have separate `tzinfos` and `apply_tzinfo` arguments, the first generating the `tzinfo` object, the second of the format `apply_tzinfo(dt, tz)` - if the second one doesn't exist, the default implementation is just `lambda dt, tz: dt.replace(tzinfo=tz)` (or equivalent)

#1 is a pretty significant (and possibly awkward) change to the interface, and #2 makes the implementation of these mappings less convenient for the downstream users, but is probably the most elegant from an API perspective. #3 is a somewhat reasonable marriage of #1 and #2, but it's ugly and I'm fairly certain it would lead to a lot of buggy code out there from people who don't realize why you would need to implement the apply function.
msg309512 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-01-05 17:13
Sorry, forgot to include the link to the dateutil implementation of the fold-resolution code: https://github.com/dateutil/dateutil/pull/517/files
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76678
2018-01-05 17:13:13p-gansslesetmessages: + msg309512
2018-01-05 17:12:21p-gansslesetmessages: + msg309511
2018-01-05 16:55:21p-gansslesetnosy: + belopolsky, p-ganssle
messages: + msg309509
2018-01-05 12:38:37akeemansetkeywords: + patch
stage: patch review
pull_requests: + pull_request4973
2018-01-05 12:24:22akeemancreate