This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _strptime.TimeRE should not enforce range in regex
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Catherine.Devlin, Steve Yeung, catherinedevlin, p-ganssle, r.david.murray
Priority: low Keywords: patch

Created on 2015-08-24 21:39 by Steve Yeung, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
file Steve Yeung, 2015-09-01 22:19
Pull Requests
URL Status Linked Edit
PR 26215 open Catherine.Devlin, 2021-05-18 19:08
Messages (8)
msg249074 - (view) Author: Steve Yeung (Steve Yeung) Date: 2015-08-24 21:39
Currently, the regex in TimeRE enforces the numeric ranges. For example:
    'm': r"(?P<m>1[0-2]|0[1-9]|[1-9])",

As a result, an invalid month will cause an generic regex error:
    ValueError: time data '2015/16/5' does not match format '%Y/%m/%d'

However, if we relax the regex to not check the range and allow datetime to handle it:
    'm': r"(?P<m>\d{1,2})"

The error will be handle in datetime instead and the error will be much more helpful:
    ValueError: month must be in 1..12

Please consider relaxing the regex for numeric ranges in _strptime.TimeRE.
msg249133 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-08-25 16:22
Do realize that the strptime code is shared with time.strptime() and so this change would have to make sense in both contexts.

This change can also only happen in Python 3.6 because it is backwards-incompatible due to people potentially already catching the previous exception.
msg249141 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-25 17:27
And it does not make sense for time, since time.strptime does no additional validation, it just returns the result returned by _strptime.
msg249518 - (view) Author: Steve Yeung (Steve Yeung) Date: 2015-09-01 22:19
I'm not sure what format I'm supposed to provide the test in. I attached a file that has the diff of the changes I made, and how the error message is changed (and improved!) in both datetime and time.
msg249541 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-09-02 13:23
A unit test in test_strptime.
msg393904 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2021-05-18 20:25
I also commented on GH-26215 ( https://github.com/python/cpython/pull/26215 ), but for posterity, I'll note a few things:

1. It seems that (and this may have changed since 2015), `_strptime._strptime` now has a stage that (unconditionally?) constructs a temporary `datetime_date`, which means it does do this particular validation in both `time.strptime` and `datetime.strptime`. That said, both flavors of `strptime` are *way* slower than I'd like them to be, and constructing an unnecessary `date`/`datetime` is a pretty good way to slow down your function, so if we ever go around optimizing this function, that may be one of the first bits on the chopping block.

2. The logic for `strptime` is very complicated and it's very hard to test the full input space of the function (particularly since we're not  using property tests (yet)...). This makes me somewhat uneasy about moving the validation stage from the beginning of the function (in parsing the regular expression) to the very *end* of the function (in the datetime constructor).

It's *probably* safe to do so, but it may also be worth exploring the possibility of validating this directly in `_strptime` (possibly immediately after the string is parsed by the regex), and raising a cleaner error message on failure.

Probably not worth spending a ton of time on that compared to improving the testing around this so that we can feel confident making changes under the hood. `.strptime` is really quite slow, and I wouldn't be surprised if we pulled out its guts and replaced most of the regex stuff with a fast C parser at some point in the future. Having good tests will both give us confidence to make this change (and that making this change won't lead to regressions in the future) and help with any future project to replace `_strptime._strptime` with a faster version, so I'd say that's the most important thing to do here.
msg393948 - (view) Author: Catherine Devlin (Catherine.Devlin) * Date: 2021-05-19 11:48
Can we close the ticket?  It doesn't sound like we're going to go with anything like this approach.

If there's a good place to record the general ambitions (better strptime tests, more clear strptime errors), that would be good, but I don't think we use tickets for that.
msg394146 - (view) Author: Catherine Devlin (catherinedevlin) * Date: 2021-05-21 19:44
Thinking about this a little more...

I suggest that slowing down execution shouldn't be a worry in this case, because trying to parse a garbage string into a date shouldn't be something code is doing zillions of times, and if it is, being slow isn't a terrible thing.  It should be the exception (so to speak) and not the rule, and thus not something we should worry about optimizing for.

That said, I can see not wanting to rely on throwing low-level errors that are outside Python's direct control.  (In fact, the PR I wrote didn't account for the OS-dependent nature of those messages, so if we decide to go ahead after all I'll need to make my PR account for that.)  If that is enough reason to reject the change, I'm content with that.

One way or the other, I would love to get this out of "Open" status - if we want to just close it as-is, I'll be satisfied.
History
Date User Action Args
2022-04-11 14:58:20adminsetgithub: 69117
2021-05-21 19:44:02catherinedevlinsetnosy: + catherinedevlin
messages: + msg394146
2021-05-19 11:48:15Catherine.Devlinsetmessages: + msg393948
2021-05-18 20:25:06p-gansslesetnosy: + p-ganssle
messages: + msg393904
2021-05-18 19:08:11Catherine.Devlinsetkeywords: + patch
nosy: + Catherine.Devlin

pull_requests: + pull_request24832
stage: test needed -> patch review
2020-06-05 18:38:36brett.cannonsetnosy: - brett.cannon
2015-09-02 13:23:46r.david.murraysetmessages: + msg249541
2015-09-01 22:19:47Steve Yeungsetfiles: + file

messages: + msg249518
2015-08-25 17:27:15r.david.murraysetnosy: + r.david.murray
messages: + msg249141
2015-08-25 16:22:38brett.cannonsetpriority: normal -> low
versions: + Python 3.6, - Python 2.7
nosy: + brett.cannon

messages: + msg249133

stage: test needed
2015-08-24 21:39:09Steve Yeungcreate