New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datetime.strptime emits IndexError on parsing 'z' as %z #87461
Comments
In Python 3.9.2, parsing 'z' (small letter) as '%z' (time zone offset) using datetime.strptime emits an IndexError. >>> from datetime import datetime
>>> datetime.strptime('z', '%z')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/usr/local/lib/python3.9/_strptime.py", line 453, in _strptime
if z[3] == ':':
IndexError: string index out of range I expect ValueError (or some another useful error) as follows. This is caused by compiling '%z' to a pattern containing 'Z' (for UTC) with the IGNORECASE flag and accessing z[3] without noticing 'z' is accepted by the regexp. |
I noticed another unexpected�effect of the IGNORECASE flag. It enables some non-ascii characters to match against the alphabets. >>> from datetime import datetime
>>> datetime.strptime("Apr\u0130l", "%B")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/usr/local/lib/python3.9/_strptime.py", line 391, in _strptime
month = locale_time.f_month.index(found_dict['B'].lower())
ValueError: 'apri̇l' is not in list I expect time data does not match error. The ASCII flag will disable matching unexpected unicode characters. |
I will address the original issue regarding '%z', but the second issue actually has to do with the Unicode representation of Turkish characters. In Turkish, the letter I ('\u0049') is a capital ı ('\u0131') and the letter İ ('\u0130') is a capital i ('\u0069'). In Python however, the lowercase of I is i, as in English. >>> '\u0049'.lower()
'i'
>>> '\u0130'.lower()
'i̇' We see that the lowercase forms of both I and İ are i, consistent with English in one case and Turkish in the other. |
@noormichael Thank you for submitting a patch, I confirmed the original issue is fixed. I'm ok this ticket is closed. Regarding the second issue, I learned it is a Turkish character (thanks!), but the error is same type so will not cause such a critical issue. |
It seems like the issue is fixed, I close it. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: