classification
Title: datetime.strptime emits IndexError on parsing 'z' as %z
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, itchyny, miss-islington, noormichael, p-ganssle
Priority: normal Keywords: patch

Created on 2021-02-22 13:58 by itchyny, last changed 2021-04-28 16:23 by miss-islington.

Pull Requests
URL Status Linked Edit
PR 24627 merged noormichael, 2021-02-23 05:08
PR 24728 closed miss-islington, 2021-03-03 16:59
PR 24729 closed miss-islington, 2021-03-03 16:59
PR 25695 open miss-islington, 2021-04-28 16:23
Messages (5)
msg387514 - (view) Author: itchyny (itchyny) Date: 2021-02-22 13:58
In Python 3.9.2, parsing 'z' (small letter) as '%z' (time zone offset) using datetime.strptime emits an IndexError.

>>> from datetime import datetime
>>> datetime.strptime('z', '%z')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.9/_strptime.py", line 453, in _strptime
    if z[3] == ':':
IndexError: string index out of range

I expect ValueError (or some another useful error) as follows.
ValueError: time data 'z' does not match format '%z'

This is caused by compiling '%z' to a pattern containing 'Z' (for UTC) with the IGNORECASE flag and accessing z[3] without noticing 'z' is accepted by the regexp.
msg387548 - (view) Author: itchyny (itchyny) Date: 2021-02-23 00:44
I noticed another unexpectedeffect of the IGNORECASE flag. It enables some non-ascii characters to match against the alphabets.

>>> from datetime import datetime
>>> datetime.strptime("Apr\u0130l", "%B")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/lib/python3.9/_strptime.py", line 391, in _strptime
    month = locale_time.f_month.index(found_dict['B'].lower())
ValueError: 'apri̇l' is not in list

I expect time data does not match error. The ASCII flag will disable matching unexpected unicode characters.
msg387550 - (view) Author: Noor Michael (noormichael) * Date: 2021-02-23 04:16
I will address the original issue regarding '%z', but the second issue actually has to do with the Unicode representation of Turkish characters. In Turkish, the letter I ('\u0049') is a capital ı ('\u0131') and the letter İ ('\u0130') is a capital i ('\u0069'). In Python however, the lowercase of I is i, as in English.

>>> '\u0049'.lower()
'i'
>>> '\u0130'.lower()
'i̇'

We see that the lowercase forms of both I and İ are i, consistent with English in one case and Turkish in the other.
msg387701 - (view) Author: itchyny (itchyny) Date: 2021-02-26 02:11
@noormichael Thank you for submitting a patch, I confirmed the original issue is fixed. I'm ok this ticket is closed. Regarding the second issue, I learned it is a Turkish character (thanks!), but the error is same type so will not cause such a critical issue.
msg388034 - (view) Author: miss-islington (miss-islington) Date: 2021-03-03 16:59
New changeset 04f6fbb6969e9860783b9ab4dc24b6fe3c6dab8d by Noor Michael in branch 'master':
bpo-43295: Fix error handling of datetime.strptime format string '%z' (GH-24627)
https://github.com/python/cpython/commit/04f6fbb6969e9860783b9ab4dc24b6fe3c6dab8d
History
Date User Action Args
2021-04-28 16:23:33miss-islingtonsetpull_requests: + pull_request24385
2021-03-03 16:59:29miss-islingtonsetpull_requests: + pull_request23501
2021-03-03 16:59:17miss-islingtonsetpull_requests: + pull_request23500
2021-03-03 16:59:05miss-islingtonsetnosy: + miss-islington
messages: + msg388034
2021-02-26 02:11:43itchynysetmessages: + msg387701
2021-02-23 05:08:08noormichaelsetkeywords: + patch
stage: patch review
pull_requests: + pull_request23411
2021-02-23 04:16:11noormichaelsetnosy: + noormichael
messages: + msg387550
2021-02-23 00:44:08itchynysetmessages: + msg387548
2021-02-22 15:06:15xtreaksetnosy: + belopolsky, p-ganssle
2021-02-22 13:58:19itchynycreate