classification
Title: datetime.fromisoformat(): Omitted colon in timezone suffix raises ValueError
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Bengt.Lüers, belopolsky, p-ganssle
Priority: normal Keywords:

Created on 2020-11-16 15:00 by Bengt.Lüers, last changed 2020-11-16 17:16 by p-ganssle.

Messages (2)
msg381102 - (view) Author: Bengt Lüers (Bengt.Lüers) Date: 2020-11-16 15:00
I am trying to parse ISO8601-formatted datetime strings with timezones.

This works fine when there is a colon separating the hour and minute digits:

>>> import datetime
>>> datetime.datetime.fromisoformat('2020-11-16T11:00:00+00:00')
>>> datetime.datetime(2020, 11, 16, 11, 0, tzinfo=datetime.timezone.utc)

However this fails when there is no colon between the hour and the minute digits:

>>> import datetime
>>> datetime.datetime.fromisoformat('2020-11-16T11:00:00+0000')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '2020-11-16T11:00:00+0000'

This behavior is unexpected, as the ISO8601 standard allows omitting the colon in the string and defining the timezone as "<time>±hhmm
":

https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC

As a workaround, I normalized the timezone suffixes before parsing:

>>> if iso8601_string.endswith('+0000'):
>>>     return iso8601_string[:-len('+0000')] + '+00:00'
>>> if iso8601_string.endswith('+00'):
>>>     return iso8601_string[:-len('+00')] + '+00:00'
>>> if iso8601_string.endswith('-0000'):
>>>     return iso8601_string[:-len('-0000')] + '+00:00'
>>> if iso8601_string.endswith('-00'):
>>>     return iso8601_string[:-len('-00')] + '+00:00'

This only works for the UTC timezone. I would be nice to have a more general solution which can handle any timezone.

I tested this with CPython 3.8. `.fromisoformat()` was added in 3.7, so earlier versions should not be affected by this:

https://docs.python.org/3/library/datetime.html#datetime.date.fromisoformat
msg381127 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2020-11-16 17:16
This is the expected behavior of `.fromisoformat()`. A similar issue is https://bugs.python.org/issue35829, which asks for the "Z" suffix to be supported.

There is a note about this in the documentation: https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat

"Caution This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil."

At some point we will work out the kinks in offering as full an ISO 8601 datetime parser as possible, but the ISO 8601 datetime spec is very complicated and includes many optional features. We deliberately chose to keep the scope of `.fromisoformat()` minimal at first, whereas `dateutil.parser.isoparse` attempts to be a full-featured ISO8601 parser.

Changing the version affected to 3.10, since this is a feature request.
History
Date User Action Args
2020-11-16 17:16:24p-gansslesettype: behavior -> enhancement
messages: + msg381127
versions: + Python 3.10, - Python 3.8
2020-11-16 16:34:29xtreaksetnosy: + belopolsky, p-ganssle
2020-11-16 15:02:08Bengt.Lüerssettitle: datetime.fromisoformat(): Missing colon in timezone suffix raises ValueError -> datetime.fromisoformat(): Omitted colon in timezone suffix raises ValueError
2020-11-16 15:01:44Bengt.Lüerssettitle: Missing colon in timezone suffix raises ValueError -> datetime.fromisoformat(): Missing colon in timezone suffix raises ValueError
2020-11-16 15:00:17Bengt.Lüerscreate