This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: strpdate('20141110', '%Y%m%d%H%S') returns wrong date
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: belopolsky, brett.cannon, dgorley, ethan.furman, lemburg
Priority: normal Keywords:

Created on 2014-11-10 22:01 by dgorley, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (11)
msg230977 - (view) Author: Doug Gorley (dgorley) Date: 2014-11-10 22:01
strptime() is returning the wrong date if I try to parse today's date (2014-11-10) as a string with no separators, and if I ask strpdate() to look for nonexistent hour and minute fields.

>>> datetime.datetime.strptime('20141110', '%Y%m%d').isoformat()
'2014-11-10T00:00:00'
>>> datetime.datetime.strptime('20141110', '%Y%m%d%H%M').isoformat()
'2014-01-01T01:00:00'
msg230979 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-10 22:47
What result did you expect?
msg230980 - (view) Author: Doug Gorley (dgorley) Date: 2014-11-10 22:53
I expected the second call to strpdate() to throw an exception, because %Y consumed '2014', %m consumed '11', and %d consumed '10', leaving nothing for %H and %M to match.  That would be consistent with the first call.
msg230983 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-10 23:29
The documentation certainly appears to say that %m, for example, will consume two digits, but it could just as easily be only for output (i.e. strftime).

I suspect this is simply a documentation issue as opposed to a bug, but let's see what the others think.
msg230986 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-10 23:50
I have recently closed a similar issue (#5979) as "won't fix".  The winning argument there was that Python behavior was consistent with C.  How does C strptime behave in this case?
msg230988 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:00
With the following C code:

#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){

  char buf[255];
  struct tm tm;

  memset(&tm, 0, sizeof(tm));
  strptime("20141110", "%Y%m%d%H%M", &tm);
  strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M", &tm);
  printf("%s\n", buf);

  return 0;
}

I get

$ ./a.out
2014-11-10 00:00

So I think Python behavior is wrong.
msg230989 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:03
Here is the case that I think illustrates the current logic better:

>>> datetime.strptime("20141234", "%Y%m%d%H%M")
datetime.datetime(2014, 1, 2, 3, 4)
msg230991 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:07
Looking at the POSIX standard

http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html

It appears that Python may be compliant:

%H The hour (24-hour clock) [00,23]; leading zeros are permitted but not required.
%m The month number [01,12]; leading zeros are permitted but not required.
%M The minute [00,59]; leading zeros are permitted but not required.
msg230993 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:13
Here is another interesting bit from the standard: "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications."

This is how they get away from not specifying whether parser of variable width fields should be greedy or not.
msg231028 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-11-11 15:34
strptime very much follows the POSIX standard as I implemented strptime by reading that doc.

If you want to see how the behaviour is implemented you can look at https://hg.python.org/cpython/file/ac0334665459/Lib/_strptime.py#l178 . But the key thing here is that the OP has unused formatters. Since strptime uses regexes underneath the hood, the re module does its best to match the entire format. Since POSIX says that e.g. the leading 0 for %m is optional, the regex goes with the single digit version to let the %H format match _something_ (same goes for %d and %M). So without rewriting strptime to not use regexes to support unused formatters and to stop being so POSIX-compliant, I don't see how to change the behaviour. Plus it would be backwards-incompatible as this is how strptime has worked in 2002.

It's Alexander's call, but I vote to close this as "not a bug".
msg231029 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 15:57
After reading the standard a few more times, I agree with Brett and Ethan that this is at most a call for better documentation.

I'll leave this open for a chance that someone will come up with a succinct description of what exactly datetime.strptime does. (Maybe we should just document the format to regexp translation implemented in _strptime.py.)

We may also include POSIX's directive "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications" as a recommendation.
History
Date User Action Args
2022-04-11 14:58:10adminsetgithub: 67029
2015-03-01 19:58:23belopolskysetstatus: open -> closed
resolution: not a bug
stage: resolved
2014-11-11 15:57:28belopolskysetversions: + Python 3.5, - Python 3.4
messages: + msg231029

assignee: belopolsky
components: + Documentation, - Library (Lib)
type: behavior -> enhancement
2014-11-11 15:34:18brett.cannonsetnosy: + brett.cannon
messages: + msg231028
2014-11-11 00:13:30belopolskysetmessages: + msg230993
2014-11-11 00:07:44belopolskysetmessages: + msg230991
2014-11-11 00:03:51belopolskysetmessages: + msg230989
2014-11-11 00:00:02belopolskysetmessages: + msg230988
2014-11-10 23:50:28belopolskysetmessages: + msg230986
2014-11-10 23:29:29ethan.furmansetnosy: + lemburg, belopolsky
messages: + msg230983
2014-11-10 22:53:18dgorleysetmessages: + msg230980
2014-11-10 22:47:43ethan.furmansetnosy: + ethan.furman
messages: + msg230979
2014-11-10 22:01:31dgorleycreate