Title: datetime.strptime not able to recognize invalid date formats
Components: Library (Lib) Versions: Python 3.6, Python 2.7
Assigned To: Nosy List: Raghunath Lingutla, belopolsky, crwilcox, p-ganssle
Created on 2018-06-22 13:03 by Raghunath Lingutla, last changed 2022-04-11 14:59 by admin.

msg320233 - (view) Author: Raghunath Lingutla (Raghunath Lingutla) Date: 2018-06-22 13:03
Can not recognize invalid date values for %Y%m%d, %y%m%d, %Y%m%d %H:%M and few more formats. In Java we have setLenient option which help us to validate to pattern and convert only valid formats
Ex: datetime.strptime('181223', '%Y%m%d') 
For above input I am getting output as 1812-02-03 00:00:00 but expected output is error as ValueError: time data '181223' does not match format '%Y%m%d'

I tested below mentioned 4 modules. All modules giving same output

1) datetime.strptime

2) timestring.Date

3) parser.parse from dateutil

4) dateparser.parse
msg320245 - (view) Author: Chris Wilcox (crwilcox) * Date: 2018-06-22 17:15
As %m and %d denote zero padded forms of month and day it seems to me this shouldn't match. Executing a small c program `char* ret = strptime("181223", "%Y%m%d", &tm);` confirms that this is considered invalid to c. The datetime docs indicate that the behavior should match C89 so I would expect python to return ValueError here as well.
msg320291 - (view) Author: Chris Wilcox (crwilcox) * Date: 2018-06-22 23:22
I looked a bit at and the corresponding tests and thought I would share my notes.

The regular expressions clearly allow non-zero padded values for both %d and %m matches. There is one test where the following is run: time.strptime("Mar 1", "%b %d"). So it seems intentional that %d and %m allow non-zero padded values.

It also just occurred to me that the example '181223' isn't ambiguous as %Y requires 4 digits and months cannot be more than 12. So it seems to me this could only be Y=1812,M=2,D=3.

There do exist cases in which they are truly ambiguous for non-zero padded values. For instance, 2018111 could potentially be 2018-Nov-1 or 2018-Jan-11. Python will deterministically take the most possible for the next value, so this will be November 11, 2018. Though, there is really no reason I can figure that can be assumed.

The edits required to stop allowing non-zero padded values were pretty straightforward and only one unit test (one that verifies 'Mar 1' comes after 'Feb 29') had to be altered. That may point more to a need to add additional tests though than an endorsement that no one is using single digit day or month values.
