Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%Z in strptime doesn't match EST and others #66571

Open
cool-RR mannequin opened this issue Sep 9, 2014 · 22 comments
Open

%Z in strptime doesn't match EST and others #66571

cool-RR mannequin opened this issue Sep 9, 2014 · 22 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@cool-RR
Copy link
Mannequin

cool-RR mannequin commented Sep 9, 2014

BPO 22377
Nosy @abalkin, @bitdancer, @karlcow, @4kir4, @berkerpeksag, @pganssle, @miss-islington, @kkirsche
PRs
  • bpo-22377: Fixes documentation for %Z in datetime #16507
  • Files
  • draft-strptime-%Z-timezones.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-09-09.22:24:15.542>
    labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
    title = "%Z in strptime doesn't match EST and others"
    updated_at = <Date 2021-11-01.12:58:19.684>
    user = 'https://github.com/cool-RR'

    bugs.python.org fields:

    activity = <Date 2021-11-01.12:58:19.684>
    actor = 'kkirsche'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2014-09-09.22:24:15.542>
    creator = 'cool-RR'
    dependencies = []
    files = ['36634']
    hgrepos = []
    issue_num = 22377
    keywords = ['patch']
    message_count = 19.0
    messages = ['226668', '226670', '226857', '226858', '226922', '226971', '226976', '226983', '227141', '265742', '265768', '328842', '339672', '339755', '339912', '339977', '353639', '357509', '405439']
    nosy_count = 11.0
    nosy_names = ['belopolsky', 'r.david.murray', 'karlcow', 'SilentGhost', 'akira', 'inglesp', 'berker.peksag', 'Alex.LordThorsen', 'p-ganssle', 'miss-islington', 'kkirsche']
    pr_nums = ['16507']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22377'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @cool-RR
    Copy link
    Mannequin Author

    cool-RR mannequin commented Sep 9, 2014

    The documentation for %Z ( https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior ) says it matches EST among others, but in practice it doesn't:

        Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AMD64)] on win32
        Type "copyright", "credits" or "license()" for more information.
        DreamPie 1.2.1
        >>> import datetime
        >>> datetime.datetime.strptime('2016-12-04 08:00:00 UTC', '%Y-%m-%d %H:%M:%S %Z')
        0: datetime.datetime(2016, 12, 4, 8, 0)
        >>> datetime.datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
        Traceback (most recent call last):
          File "<pyshell#2>", line 1, in <module>
            datetime.datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
          File "C:\Python34\lib\_strptime.py", line 500, in _strptime_datetime
            tt, fraction = _strptime(data_string, format)
          File "C:\Python34\lib\_strptime.py", line 337, in _strptime
            (data_string, format))
        ValueError: time data '2016-12-04 08:00:00 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
        >>>

    @cool-RR cool-RR mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 9, 2014
    @bitdancer
    Copy link
    Member

    Looking at the code, the only timezone strings it recognizes are utc, gmt, and whatever is in time.tzname (EST and EDT, in my case).

    This seems...barely useful, although clearly not useless :)

    And does not seem to be documented.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Sep 13, 2014

    if PEP-431 is implemented (or anything that gives access to zoneinfo)
    then strptime could extend the list of timezones it accepts (utc +
    local timezone names) to include names from the tz database:

      import pytz # $ pip install pytz

    {tzname for tz in map(pytz.timezone, pytz.all_timezones)
    for _, _, tzname in getattr(tz, '_transition_info', [])}

    It includes EST.

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Sep 13, 2014

    Without %z (utc offset) strptime returns a naive datetime object that
    is interpreted as utc or local time usually.

    It might explain why %Z tries to match only utc and the local timezone
    names.

    @bitdancer
    Copy link
    Member

    I think its existing behavior is because it doesn't have access to a list of recognized timezones. As you say, this could be fixed by PEP-431. It could also be fixed by adopting the "email standard" timezones (see email/_parseaddr.py), which is a def-facto standard.

    Regardless of whether something is done to expand the number of timezone it knows about, though, there's a current doc bug that should be fixed. If someone wants to advocate for expanding the timezone list, that should be a separate issue.

    @berkerpeksag
    Copy link
    Member

    if PEP-431 is implemented (or anything that gives access to zoneinfo)
    then strptime could extend the list of timezones it accepts (utc +
    local timezone names) to include names from the tz database:

    FTR, I have a WIP(and probably a bit outdated) branch to implement PEP-431 on GitHub:

    https://github.com/berkerpeksag/cpython/tree/pep431
    

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Sep 17, 2014

    If the current implementation is considered correct (%Z not recognizing
    EST) then indeed extending the list of recognized timezones is another
    issue. And the docs should be changed to match the implementation.

    The current behavior is broken, see also bpo-22426

    If we assume that the docs are correct (%Z should match EST) even if it
    is not implemented yet then it is this issue's responsibility to extend
    the list of recognized timezones (even an incomplete hard-coded list
    generated by the code from msg226857 would be fine).

    Lib/email/_parseaddr.py approach (tzname corresponds to a fixed utc
    offset) is wrong: tzname may correspond to multiple utc offsets at the
    same time (different timezones) and at different times (even within the
    same timezone). Having the tz database won't fix it: *tzname along is
    not enough to determine UTC offset in _many_ cases.*

    CST is ambiguous if %z is not given therefore even if strptime() had the
    access to a larger list of recognized timezones; it is not clear what
    the return value would be:

    • aware datetime: which timezone to use?

    • naive datetime: it might be misleading if the input timezone name
      doesn't correspond to utc or the local timezone

    email._parseaddr._timezones is misleading if used globally: CST is also
    used in Australia, China with different utc offsets.

    One of possible solutions is to return aware datetime objects if a new
    truthy *timezones* keyword-only argument is provided. It may contain a
    mapping to disambiguate timezone abbreviations: {'EST': timedelta|tzinfo}.

    If *timezones* is False then strptime() returns a naive datetime
    object. The only difference from the current behavior is that a larger
    list of timezones is supported to match the docs.

    With bool(timezones) == True, strptime() could return an aware datetime
    object in utc, local timezones, or any timezone in *timezones* if it is
    a mapping.

    Default *timezones* is None that means timezone=False for backward
    compatibility. DeprecationWarning is added that timezone=True will be
    default in the next release 3.6 if no objections are received
    until then e.g.,

        if tzname and timezones is None: # %Z matches non-empty string
            warn("Default *timezones* parameter for "
                 "%s.strptime() will be True in Python 3.6. "
                 "Pass timezones=False to preserve the old behaviour" % (
                     cls.__qualname__,),
                 category=DeprecationWarning, stacklevel=2)

    I've uploaded the patch draft-strptime-%Z-timezones.diff that implements
    this solution. It also contains tests and docs updates.

    @bitdancer
    Copy link
    Member

    I don't think we are going to support a timezone list like that without PEP-431.

    You should attach your patch to a new issue. When I said this should the doc issue, that is because only a doc fix is acceptable for 3.4. Adding more timezones to recognize would be an enhancement, given the complexity of the proposed solution.

    On the other hand, if timezone names are ambiguous, I'm not sure there *is* a fix other than using the "defacto standard" names and offsets used by the email library. Actually, isn't there a written standard that addresses this issue? I seem to remember reading a discussion of the problem somewhere...

    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Sep 20, 2014

    I don't think we are going to support a timezone list like that without PEP-431.

    PEP-431 won't fix this issue. See below.

    You should attach your patch to a new issue. When I said this should
    the doc issue, that is because only a doc fix is acceptable for 3.4.
    Adding more timezones to recognize would be an enhancement, given the
    complexity of the proposed solution.

    The docs are correct (they imply that %Z should accept EST). It is the
    implementation that is deficient.

    The patch introduces a new parameter therefore I agree: it should be
    applied only in 3.5+

    On the other hand, if timezone names are ambiguous, I'm not sure there
    *is* a fix other than using the "defacto standard" names and offsets
    used by the email library. Actually, isn't there a written standard
    that addresses this issue? I seem to remember reading a discussion of
    the problem somewhere...

    Multi-timezone programming

    email._parseaddr._timezones with CST=-600 is like US-ASCII (the
    standard).

    Code that uses local timezone is bilingual (locale-based): CST=-600 in
    Chicago but it is CST=+800 in China and it may be something else in
    other parts of the world. The *timezones* parameter in my patch allows
    to specify the encoding different from the current locale.

    Code that uses the tz database is multilingual (Unicode): knowing the
    encoding (zoneinfo name and the time) it is possible to decode almost
    all encoded characters (to find out whether the timezone abbreviation is
    valid with a given time and to find the correct UTC offset).

    If you don't know the encoding then the support for Unicode (the
    presence of the tz database (PEP-431)) along won't allow you to decode a
    byte sequence (time string). You need an encoding (timezone name, time)
    to interpret the input correctly.

    Given that the list is used to accept a string as a timezone
    abbreviation, I don't think it should depend on PEP-431 e.g., old date
    strings/people may use WST even if the new pytz timezones do not use it.

    The initial list could be seeded from using pytz as in my patch and then
    expanded as necessary by hand (there is no official entity that tracks
    timezone abbreviations).

    @inglesp
    Copy link
    Mannequin

    inglesp mannequin commented May 17, 2016

    Given the difference between the documented and the actual behaviours, and given that it's apparently not obvious what the correct fix should be, would a patch that updates the docs (to say that %Z only matched GMT and UTC) be welcome?

    @bitdancer
    Copy link
    Member

    Peter: yes, that is what I've been saying this issue is for :) Anything else is a new issue.

    Note that it *does* also recognize the strings in time.tzname in addition to UTC and GMT.

    @abalkin
    Copy link
    Member

    abalkin commented Oct 29, 2018

    I think strptime should only accept %Z when it comes together with %z and not do any validation.

    This is close to the current behavior. %Z by itself is useless because even when it is accepted, the value is discarded:

    >>> print(datetime.strptime('UTC', '%Z'))
    1900-01-01 00:00:00

    You have to use %z to get an aware datetime instance:

    >>> print(datetime.strptime('UTC+0000', '%Z%z'))
    1900-01-01 00:00:00+00:00

    The validation is already fairly lax:

    >>> print(datetime.strptime('UTC+1234', '%Z%z'))
    1900-01-01 00:00:00+12:34

    I don't think this issue has anything to do with the availability of zoneinfo database. Timezone abbreviations are often ambiguous and should only serve as a human-readable supplement to the UTC offset and cannot by itself be used as a TZ specification.

    @AlexLordThorsen
    Copy link
    Mannequin

    AlexLordThorsen mannequin commented Apr 8, 2019

    This behavior is currently unchanged and the docs still state that EST is an acceptable value.

    >>> datetime.strptime("2019-01-28 18:54:45 EST", "%Y-%m-%d %H:%M:%S %Z")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
        tt, fraction, gmtoff_fraction = _strptime(data_string, format)
      File "/usr/local/lib/python3.7/_strptime.py", line 359, in _strptime
        (data_string, format))
    ValueError: time data '2019-01-28 18:54:45 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
    

    @pganssle
    Copy link
    Member

    pganssle commented Apr 9, 2019

    @alex LordThorsen: It will accept EST if EST is one of your "local" time zones, so whatever's in time.tzname.

    In the short term, I think the right thing to do would be to update the documentation to remove the reference to "EST", and add an explanatory note in the section about %Z that explains that it accepts a few hard-coded values + whatever's in time.tzname.

    In the long run, I think the best "out of the box" support we can provide would be supporting %Z when %z is present (per Alexander's suggestion), and possibly something akin to dateutil's "tzinfos", where a mapping between abbreviations and tzinfo objects could be passed to strptime explicitly.

    @AlexLordThorsen
    Copy link
    Mannequin

    AlexLordThorsen mannequin commented Apr 11, 2019

    It's been a while since I've committed a patch. Do I still upload a diff file here or should I open a PR for the doc changes on github?

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Apr 11, 2019

    PR on github, Alex

    @pganssle pganssle added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Sep 12, 2019
    @karlcow
    Copy link
    Mannequin

    karlcow mannequin commented Oct 1, 2019

    I created a PR following the recommendations of p-ganssle
    #16507

    @miss-islington
    Copy link
    Contributor

    New changeset bc441ed by Miss Islington (bot) (Karl Dubost) in branch 'master':
    bpo-22377: Fixes documentation for %Z in datetime (GH-16507)
    bc441ed

    @kkirsche
    Copy link
    Mannequin

    kkirsche mannequin commented Nov 1, 2021

    With the introduction of PEP-0615 (https://www.python.org/dev/peps/pep-0615/) — Support for the IANA Time Zone Database in the Standard Library — should this be revisited to now leverage ZoneInfo to fully parse these time zone values in Python 3.9+ (or 3.11 with introduction of the feature if it is unreasonable to backport this)?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @pganssle
    Copy link
    Member

    With the introduction of PEP-0615 (https://www.python.org/dev/peps/pep-0615/) — Support for the IANA Time Zone Database in the Standard Library — should this be revisited to now leverage ZoneInfo to fully parse these time zone values in Python 3.9+ (or 3.11 with introduction of the feature if it is unreasonable to backport this)?

    There are two major issues that aren't solved by the zoneinfo module:

    1. The IANA time zone database in general (and definitely zoneinfo specifically) does not provide any sort of way to look up a zone by its abbreviation, so it would not be a simple matter to accomplish this.
    2. Even if it were possible to look up zones by 3/4-letter abbreviation, a much bigger issue by far is the fact that the kind of thing that %Z outputs is not unique, so there's no way we can add a global parser for these. CST means 3 different things, IST means at least two different things, etc.

    As mentioned earlier in the thread, the best options we have for this are:

    1. Allow users to pass in a mapping between potential %Z results and tzinfo objects, which we would attach as part of the return value of strptime.
    2. Add a version of strptime that can return the value of %Z, so that users can pass that to their own parsing function.

    I am not sure that either option is entirely satisfactory. The second one is probably the better one, but I don't think people would like it as much 😅. Maybe something like this:

    >>> datetime.strptime.with_parse_info(""2022-01-01T12:00:00 EST", %Y-%m-%dT%H:%M:%S %Z")
    datetime._parse_Info(result=datetime.datetime(2022, 1, 1, 12, 0, 0, 0), z="EST")
    

    If %Z is the only thing where there's any uncertainty as to what was actually found in the string at the outset, maybe that could just be datetime.strptime.with_tzname(), which raises ValueError if the string doesn't contain a %Z and returns tuple[datetime.datetime, str].

    We might also be able to be more lax about what is accepted for %Z when %z is specified. That wouldn't help in the more general case, but in the case of something like this:

    >>> datetime.strptime("2020-01-01T00-06:00 CST", "%Y-%m-%-dT%H%z %Z")
    datetime.datetime(2020, 1, 1, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400), 'CST'))
    

    It would be possible for someone to implement their own tzname-to-IANA mapping, since they could key on dt.tzinfo.tzname() and possibly even dt.tzinfo.utcoffset(). The only problems with this are:

    1. Right now such a strategy only works if the value parsed from %Z is one of the local time zone abbreviations in your locale.
    2. This will not solve the problem for people who want to parse %Z and cannot control the string format. (Of course, such people could do something hacky like tacking on %z to the end of their format string and +00:00 to the end of their input string just to get access to dt.tzname(), and maybe that's Good Enough™).

    I'm inclined to start with this last solution since it's a minimal change and the current behavior is probably unexpected for most people anyway, and it makes it at least possible for most people to solve their own problem.

    @eugenetriguba
    Copy link
    Contributor

    @pganssle If I'm understanding correctly, the main crux of this problem is that %Z seems to be fundamentally flawed since the timezone abbreviations aren't unique. That has ripple effects in making it difficult to return a TZ-aware datetime with %Z alone and with adding support for parsing more timezone abbreviations because it would be ambiguous. CST, for instance, could mean two different things. To attempt to resolve this, we would need to put the onus on the caller to disambiguate it for us or simply give it back and have them parse it as they see fit and that is where those options you are proposing come from. Would you characterize that as a fair understanding?

    I'm inclined to start with this last solution since it's a minimal change and the current behavior is probably unexpected for most people anyway, and it makes it at least possible for most people to solve their own problem.

    Are you referring to this?

    1. Add a version of strptime that can return the value of %Z, so that users can pass that to their own parsing function.

    @pganssle
    Copy link
    Member

    I'm inclined to start with this last solution since it's a minimal change and the current behavior is probably unexpected for most people anyway, and it makes it at least possible for most people to solve their own problem.

    Are you referring to this?

    Sorry, I realize that was a bit confusing. I was actually referring to this:

    We might also be able to be more lax about what is accepted for %Z when %z is specified.

    I think the least controversial change here would probably be to make it so %Z accepts a wider range of inputs. Maybe just the current set of valid inputs plus any 3- and 4- letter abbreviations, or maybe it's feasible to have it match "until the end of the string or until the next token" (though that could get complicated if someone constructs "garden path" datetime strings, or if you have something like %Z followed by a non-fixed-width directive — though I don't actually know if we support any non-fixed-width directives). To be even less controversial (though possibly also less useful), we can change the behavior only if %z is also specified, since otherwise %Z would be ignored.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Development

    Successfully merging a pull request may close this issue.

    6 participants