classification
Title: strptime accepts the wrong '2010-06-01 MSK' string but rejects the right '2010-06-01 MSD'
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: akira, belopolsky, lemburg, p-ganssle
Priority: normal Keywords:

Created on 2014-09-16 21:59 by akira, last changed 2018-07-05 15:59 by p-ganssle.

Messages (7)
msg226966 - (view) Author: Akira Li (akira) * Date: 2014-09-16 21:59
>>> import os
  >>> import time
  >>> os.environ['TZ'] = 'Europe/Moscow'
  >>> time.tzset()
  >>> time.strptime('2010-06-01 MSK', '%Y-%m-%d %Z')
  time.struct_time(tm_year=2010, tm_mon=6, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=152, tm_isdst=0)
  >>> time.strptime('2010-06-01 MSD', '%Y-%m-%d %Z')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.7/_strptime.py", line 467, in _strptime_time
      return _strptime(data_string, format)[0]
    File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
      (data_string, format))
  ValueError: time data '2010-06-01 MSD' does not match format '%Y-%m-%d %Z'

datetime.strptime() and Python 3 behavior is exactly the same. 

The correct name is MSD:

  >>> from datetime import datetime, timezone
  >>> dt = datetime(2010, 5, 31, 21, tzinfo=timezone.utc).astimezone()
  >>> dt.strftime('%Y-%m-%d %Z')
  '2010-06-01 MSD'

strptime() uses the current (wrong for the past date) time.tzname names
despite the correct name being known to the system (as the example above 
demonstrates).

In general, it is impossible to validate a time zone abbreviation even if
the time zone database is available:

- tzname may be ambiguous -- multiple zoneinfo matches (around one third
  of tznames the tz database correspond to multiple UTC offsets (at the
  same or different times) -- it is not unusual) i.e., any scheme that 
  assumes that tzname is enough to get UTC offset such as
  Lib/email/_parsedate.py is wrong.

- and even if zoneinfo is known, it may be misleading e.g., 
  e.g., HAST (Hawaii-Aleutian Standard Time) might be rejected
  because Pacific/Honolulu zoneinfo uses HST. HAST corresponds to 
  America/Adak (US/Aleutian) in tzdata (UTC offset may be the same).
  It might be too rare to care.

Related: issue22377
msg226967 - (view) Author: Akira Li (akira) * Date: 2014-09-16 22:05
Correction:

The correct offset is +0400:

  >>> dt = datetime(2010, 5, 31, 20, tzinfo=timezone.utc).astimezone()

And _timezones dict is defined in Lib/email/_parseaddr.py
msg226969 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-09-17 00:53
There is no daylight savings time in Moscow and python detects this correctly:

$ TZ=Europe/Moscow python3
>>> import time
>>> time.daylight
0

Note that historically, there was DST, but time module cannot handle historical TZ changes.

(Russian government compensates the relative sanity of not moving the clocks twice a year by changing the UTC offset and TZ boundaries every 5 years or so.:-)
msg226970 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-09-17 01:12
I don't think there is anything we can do here.  Without a TZ database, Python has to rely on time.tzname which in case of TZ=Europe/Moscow returns

>>> time.tzname
('MSK', 'MSK')

Hardcoding a timezones dictionary as done in email module may work for a handful of American timezones, but will not work for TZ's like Europe/Moscow.

$ zdump -v  Europe/Moscow| tail
Europe/Moscow  Sat Oct 24 22:59:59 2009 UTC = Sun Oct 25 02:59:59 2009 MSD isdst=1
Europe/Moscow  Sat Oct 24 23:00:00 2009 UTC = Sun Oct 25 02:00:00 2009 MSK isdst=0
Europe/Moscow  Sat Mar 27 22:59:59 2010 UTC = Sun Mar 28 01:59:59 2010 MSK isdst=0
Europe/Moscow  Sat Mar 27 23:00:00 2010 UTC = Sun Mar 28 03:00:00 2010 MSD isdst=1
Europe/Moscow  Sat Oct 30 22:59:59 2010 UTC = Sun Oct 31 02:59:59 2010 MSD isdst=1
Europe/Moscow  Sat Oct 30 23:00:00 2010 UTC = Sun Oct 31 02:00:00 2010 MSK isdst=0
Europe/Moscow  Sat Mar 26 22:59:59 2011 UTC = Sun Mar 27 01:59:59 2011 MSK isdst=0
Europe/Moscow  Sat Mar 26 23:00:00 2011 UTC = Sun Mar 27 03:00:00 2011 MSK isdst=0
Europe/Moscow  Mon Jan 18 03:14:07 2038 UTC = Mon Jan 18 07:14:07 2038 MSK isdst=0
Europe/Moscow  Tue Jan 19 03:14:07 2038 UTC = Tue Jan 19 07:14:07 2038 MSK isdst=0

(And it looks like the planned for 2014-10-26 switch back to winter time is not in my laptop's database yet.)
msg226972 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-09-17 01:22
On the second thought, we can probably make the same guesswork as in PyInit_timezone (see Modules/timemodule.c) in time.strptime, but not for the current time, but for the time parsed.
msg226977 - (view) Author: Akira Li (akira) * Date: 2014-09-17 04:20
My patch for issue22377 also fixes this bug.

With the patch applied. Both MSK and MSD are accepted if the new 
timezones parameter is false (default for Python 3.5, will be changed to 
True in Python 3.6

If timezones is True then MSD return a correct aware datetime object,
MSK is rejected.
msg226978 - (view) Author: Akira Li (akira) * Date: 2014-09-17 04:24
MSD variant works on my machine because C library uses
the historical timezone database there. I'm not sure whether it
works on old Windows versions.
History
Date User Action Args
2018-07-05 15:59:37p-gansslesetnosy: + p-ganssle
2016-09-13 15:48:43belopolskysetversions: + Python 3.7, - Python 3.5
2014-09-17 04:24:32akirasetmessages: + msg226978
2014-09-17 04:20:55akirasetmessages: + msg226977
2014-09-17 01:34:06belopolskysetstage: needs patch
type: behavior -> enhancement
versions: - Python 2.7, Python 3.4
2014-09-17 01:33:14belopolskysetassignee: belopolsky
2014-09-17 01:22:36belopolskysetmessages: + msg226972
2014-09-17 01:12:31belopolskysetmessages: + msg226970
2014-09-17 00:53:58belopolskysetmessages: + msg226969
2014-09-17 00:39:13ned.deilysetnosy: + lemburg, belopolsky
2014-09-16 22:05:03akirasetmessages: + msg226967
2014-09-16 21:59:06akiracreate