This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: datetime.strptime not able to recognize invalid date formats
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Raghunath Lingutla, belopolsky, crwilcox, p-ganssle
Priority: normal Keywords:

Created on 2018-06-22 13:03 by Raghunath Lingutla, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg320233 - (view) Author: Raghunath Lingutla (Raghunath Lingutla) Date: 2018-06-22 13:03
Can not recognize invalid date values for %Y%m%d, %y%m%d, %Y%m%d %H:%M and few more formats. In Java we have setLenient option which help us to validate to pattern and convert only valid formats
Ex: datetime.strptime('181223', '%Y%m%d') 
For above input I am getting output as 1812-02-03 00:00:00 but expected output is error as ValueError: time data '181223' does not match format '%Y%m%d'

I tested below mentioned 4 modules. All modules giving same output

1) datetime.strptime

2) timestring.Date

3) parser.parse from dateutil

4) dateparser.parse
msg320245 - (view) Author: Chris Wilcox (crwilcox) * Date: 2018-06-22 17:15
As %m and %d denote zero padded forms of month and day it seems to me this shouldn't match. Executing a small c program `char* ret = strptime("181223", "%Y%m%d", &tm);` confirms that this is considered invalid to c. The datetime docs indicate that the behavior should match C89 so I would expect python to return ValueError here as well. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
msg320291 - (view) Author: Chris Wilcox (crwilcox) * Date: 2018-06-22 23:22
I looked a bit at _strptime.py and the corresponding tests and thought I would share my notes.

The regular expressions clearly allow non-zero padded values for both %d and %m matches. There is one test where the following is run: time.strptime("Mar 1", "%b %d"). So it seems intentional that %d and %m allow non-zero padded values.

It also just occurred to me that the example '181223' isn't ambiguous as %Y requires 4 digits and months cannot be more than 12. So it seems to me this could only be Y=1812,M=2,D=3.

There do exist cases in which they are truly ambiguous for non-zero padded values. For instance, 2018111 could potentially be 2018-Nov-1 or 2018-Jan-11. Python will deterministically take the most possible for the next value, so this will be November 11, 2018. Though, there is really no reason I can figure that can be assumed.

The edits required to stop allowing non-zero padded values were pretty straightforward and only one unit test (one that verifies 'Mar 1' comes after 'Feb 29') had to be altered. That may point more to a need to add additional tests though than an endorsement that no one is using single digit day or month values.
History
Date User Action Args
2022-04-11 14:59:02adminsetgithub: 78122
2018-07-05 15:00:50p-gansslesetnosy: + p-ganssle
2018-06-22 23:22:55crwilcoxsetmessages: + msg320291
2018-06-22 20:50:35ned.deilysetnosy: + belopolsky
2018-06-22 17:15:18crwilcoxsetversions: + Python 2.7
nosy: + crwilcox

messages: + msg320245

components: + Library (Lib), - Tests
2018-06-22 13:03:38Raghunath Lingutlacreate