classification
Title: BUG in how _strptime() handles week 0
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Erik.Cederstrand, Saimadhav.Heblikar, aganders3, belopolsky, jamercee, lemburg, python-dev, serhiy.storchaka, torm
Priority: normal Keywords: patch

Created on 2014-12-30 21:04 by jamercee, last changed 2015-03-19 19:25 by jamercee. This issue is now closed.

Files
File name Uploaded Description Edit
patch.txt jamercee, 2014-12-30 21:04 Bug fix
example.c Saimadhav.Heblikar, 2014-12-31 16:32
strptimetest.c serhiy.storchaka, 2014-12-31 16:51 Utility for testing C strptime
strptime_julian_none.patch serhiy.storchaka, 2014-12-31 16:51 Use None as signal value for julian review
strptime_check_valid_week.patch serhiy.storchaka, 2014-12-31 17:30 Check if week and weekday fit in the year review
strptimetest2.c belopolsky, 2014-12-31 18:05
strptime_julian_none_2.patch serhiy.storchaka, 2015-01-11 11:13 Added tests review
Messages (24)
msg233219 - (view) Author: Jim Carroll (jamercee) Date: 2014-12-30 21:04
The following bit of code demonstrates a bug in how _strptime handles week 0. The bug is an edge condition that specifically affects how it handles two days before the first day of the new year

>>> for i in range(7):
...     datetime.strptime('%s %s %s' % (0, 2015, i), '%W %Y %w').date()
... 
datetime.date(2015, 1, 4)
datetime.date(2014, 12, 29)
datetime.date(2015, 1, 1)   # <-- ERROR: should be '2014, 12, 30'
datetime.date(2014, 12, 31)
datetime.date(2015, 1, 1)
datetime.date(2015, 1, 2)
datetime.date(2015, 1, 3)

The issue stems from the fact that calls to _calc_julian_from_U_or_W() can return a -1, which under normal circumstances is invalid. The strptime() function tests and corrects when julian values are -1...but it should not do this when the week_of_year is zero.

The fact that it tests for ONLY -1 is why we see the problem in 2015. The error condition only exists when the first day of the year is exactly two days ahead of the date being tested for.

Patch is attached
msg233248 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 16:25
What's a New Year without fixing a calendar bug! :-)

Jim's logic looks right to me.  The patch needs a test before it can be applied.
msg233250 - (view) Author: Saimadhav Heblikar (Saimadhav.Heblikar) * Date: 2014-12-31 16:32
Hi, 

Here is my understanding of it.
I have used the following C example of strptime and strftime to show that the string '0 2015 2' along with the format specifier '%W %Y %w' is invalid.

For any valid string, strptime followed by strftime should return the same string. This is easily verified. However, when you try the same thing(as in the attached example) with '0 2015 2', we get a different string after strptime followed by strftime, indicating the string is invalid.

Do let me know if I have made any mistake in understanding this.
msg233251 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 16:41
In a comment for #23134, Serhiy suggested that this may not be a bug (msg233215) because C strptime behavior for this case is undefined.

I disagree.  On Mac OSX, strptime man page states:

"""
     The %U and %W format specifiers accept any value within the range 00 to 53 without validat-
     ing against other values supplied (like month or day of the year, for example).
"""

And for the data="0 2015 2", format="%W %Y %w" case, it produces 2015-12-30 00:00:00 as expected.
msg233252 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 16:48
See also #12006.
msg233253 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-31 16:51
Thank you Jim for your patch. That all right and the patch fixes the issue. But 
the code looks a little tricky, and it would be more robust to use other 
signal value for julian (e.g. None).

However I'm not sure that strptime should accept weekdays before the start of 
year. In C the strptime function doesn't return valid date for such input.

$ ./strptimetest "0 2015 1" "%W %Y %w"
2015-00--2 00:00:00
$ ./strptimetest "0 2015 2" "%W %Y %w"
2015-00--1 00:00:00
$ ./strptimetest "0 2015 3" "%W %Y %w"
2015-00-00 00:00:00
$ ./strptimetest "0 2015 4" "%W %Y %w"
2015-01-01 00:00:00
$ ./strptimetest "0 2015 5" "%W %Y %w"
2015-01-02 00:00:00
$ ./strptimetest "0 2015 6" "%W %Y %w"
2015-01-03 00:00:00
$ ./strptimetest "0 2015 7" "%W %Y %w"
2015-01-00 00:00:00

May be Python strptime should raise an exception for weekdays before the start 
of year as well as it raise an exception for weekdays out of range 0-6.

I'm not sure that any patch should be applied to maintained releases.
msg233255 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-31 17:30
Here is alternative patch which rejects week-weekday combinations out of specified year.
msg233256 - (view) Author: Jim Carroll (jamercee) Date: 2014-12-31 17:35
I understand. Actually, raising an exception would be perfectly acceptable as well (possibly even more desirable). 

I too experimented with the c-lib strptime() and discovered the negative values of tm_mday. These results are good too -- as they clearly show the number of days before the new year.

I don't think it's important for the python version to return the same values, but it would be better for it to at least be predictable.
msg233257 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 17:40
> For any valid string, strptime followed by strftime should return
> the same string.

Not necessarily.  String to datetime mapping implemented by strptime can be many to one.  For example,

>>> datetime.strptime("2014 366", "%Y %j") == datetime.strptime("2015 1", "%Y %j")
True

in this case, strptime-strftime may not round-trip.

>>> datetime.strptime("2014 366", "%Y %j").strftime("%Y %j")
'2015 001'
msg233258 - (view) Author: Saimadhav Heblikar (Saimadhav.Heblikar) * Date: 2014-12-31 18:00
>>Not necessarily.  String to datetime mapping implemented by strptime can be many to one.  For example,

I was referring to C strptime and strftime. 
But thanks for posting the round trip example. I was unaware of it.
msg233259 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 18:05
I would prefer to accept "denormalized" "%Y %W %w" combinations.  Note that while Mac OSX and glibc versions of strptime produce different results, neither implementation signals an error by returning NULL.

> In C the strptime function doesn't return valid date for such input.

This is not true.  In C, struct tm is not required to contain normalized data.  I am attaching a modified version of Serhiy's strptimetest.c:

$ diff -u strptimetest.c strptimetest2.c
--- strptimetest.c	2014-12-30 13:45:17.000000000 -0500
+++ strptimetest2.c	2014-12-31 12:56:17.000000000 -0500
@@ -16,6 +16,7 @@
         exit(1);
     }
     strptime(argv[1], argv[2], &tm);
+    mktime(&tm);
     strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M:%S", &tm);
     puts(buf);
     return 0;

With this modification, I get on Linux

$ ./a.out "0 2015 2" "%W %Y %w"
2014-11-29 00:00:00

which is still wrong, but I think this is a glibc issue: mktime accepted strptime result as valid even though it failed to normalize it correctly.
msg233265 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-12-31 18:19
Also, given that "2015 0 0", "2015 0 1", and  "2015 0 3", all currently work with  '%Y %W %w' format specification and produce dates in the year 2014, I think raising an error in all those cases is more likely to break user code than correcting the value in the edge case.
msg233323 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-01-02 19:47
Note that round-trip also fails in weeks 52-53.  For example,

>>> datetime.strptime('2014 53 6', '%Y %W %w')
datetime.datetime(2015, 1, 10, 0, 0)
>>> datetime.strptime('2014 53 6', '%Y %W %w').strftime('%Y %W %w')
'2015 01 6'

If we decide to make "2015 0 0" invalid for format '%Y %W %w', we should probably invalidate the entire week 53 in 2014 for consistency.  However, I don't think there are any C implementations that would have a problem with such dates.

Overall, I am inclined to accept Jim's solution for 3.5, but I am not sure about the maintenance branches.
msg233328 - (view) Author: Jim Carroll (jamercee) Date: 2015-01-02 21:50
All the proposed patches are acceptable from my point of view, but it would be really great if we could get the fix in the 2.x line.  It seems unlikely there could be any legacy code that would depend on the bug to exist (ie: to only be wrong when then date requested is exactly two days before the first date of the new year).
msg238540 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-03-19 16:38
Serhiy,

Your latest patch LGTM.
msg238542 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-19 17:15
New changeset 350ea309953e by Serhiy Storchaka in branch '2.7':
Issue #23136: _strptime now uniformly handles all days in week 0, including
https://hg.python.org/cpython/rev/350ea309953e

New changeset 93e1da502338 by Serhiy Storchaka in branch '3.4':
Issue #23136: _strptime now uniformly handles all days in week 0, including
https://hg.python.org/cpython/rev/93e1da502338

New changeset 750c6d53890a by Serhiy Storchaka in branch 'default':
Issue #23136: _strptime now uniformly handles all days in week 0, including
https://hg.python.org/cpython/rev/750c6d53890a
msg238543 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-19 17:16
Thank you for the review Alexander. Thank you for your contribution Jim.
msg238544 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-03-19 17:18
"including Jan 30" - did you mean Dec 31?
msg238545 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-19 17:51
Oh, I meant Dec 30.
msg238546 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-03-19 17:52
I don't think it is worth the effort to try to fix the commit message, but it would be nice to have the NEWS text corrected.
msg238547 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-19 17:55
New changeset d9f3dd1d6b48 by Serhiy Storchaka in branch '2.7':
Fixed Misc/NEWS entry for issue #23136.
https://hg.python.org/cpython/rev/d9f3dd1d6b48

New changeset afca9867e1bb by Serhiy Storchaka in branch '3.4':
Fixed Misc/NEWS entry for issue #23136.
https://hg.python.org/cpython/rev/afca9867e1bb

New changeset 18bc81ce6090 by Serhiy Storchaka in branch 'default':
Fixed Misc/NEWS entry for issue #23136.
https://hg.python.org/cpython/rev/18bc81ce6090
msg238548 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-19 17:59
Is it possible to fix the commit message without breaking any of cloned 
repositories that already have pulled it?
msg238550 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-03-19 18:04
Probably not.
msg238551 - (view) Author: Jim Carroll (jamercee) Date: 2015-03-19 19:25
Thanks for all your work !
History
Date User Action Args
2015-03-19 19:25:55jamerceesetmessages: + msg238551
2015-03-19 18:04:25belopolskysetmessages: + msg238550
2015-03-19 17:59:01serhiy.storchakasetmessages: + msg238548
2015-03-19 17:55:39python-devsetmessages: + msg238547
2015-03-19 17:52:56belopolskysetmessages: + msg238546
2015-03-19 17:51:14serhiy.storchakasetmessages: + msg238545
2015-03-19 17:18:13belopolskysetmessages: + msg238544
2015-03-19 17:16:33serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg238543

stage: commit review -> resolved
2015-03-19 17:15:06python-devsetnosy: + python-dev
messages: + msg238542
2015-03-19 16:38:55belopolskysettype: behavior
messages: + msg238540
stage: patch review -> commit review
2015-01-11 11:13:08serhiy.storchakasetfiles: + strptime_julian_none_2.patch
2015-01-02 21:50:40jamerceesetmessages: + msg233328
2015-01-02 19:47:53belopolskysetmessages: + msg233323
2014-12-31 19:20:11belopolskysetnosy: + Erik.Cederstrand, aganders3
2014-12-31 18:19:55belopolskysetmessages: + msg233265
2014-12-31 18:05:48belopolskysetfiles: + strptimetest2.c

messages: + msg233259
2014-12-31 18:00:19Saimadhav.Heblikarsetmessages: + msg233258
2014-12-31 17:40:25belopolskysetmessages: + msg233257
2014-12-31 17:35:28jamerceesetmessages: + msg233256
2014-12-31 17:30:15serhiy.storchakasetfiles: + strptime_check_valid_week.patch

messages: + msg233255
2014-12-31 16:51:57tormsetnosy: + torm
2014-12-31 16:51:29serhiy.storchakasetfiles: + strptimetest.c, strptime_julian_none.patch
keywords: + patch
messages: + msg233253
2014-12-31 16:48:25belopolskysetmessages: + msg233252
2014-12-31 16:41:49belopolskysetmessages: + msg233251
2014-12-31 16:32:31Saimadhav.Heblikarsetfiles: + example.c

messages: + msg233250
2014-12-31 16:30:33belopolskylinkissue23134 superseder
2014-12-31 16:25:13belopolskysetmessages: + msg233248
2014-12-31 15:37:22Saimadhav.Heblikarsetnosy: + Saimadhav.Heblikar
2014-12-31 12:58:26pitrousetnosy: + serhiy.storchaka
2014-12-31 09:00:31pitrousetnosy: + lemburg, belopolsky
stage: patch review

versions: + Python 3.4, Python 3.5
2014-12-30 21:04:37jamerceecreate