classification
Title: datetime: parse "Z" timezone suffix in fromisoformat()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, hongweipeng, jwilk, mehaase, p-ganssle, rdb
Priority: normal Keywords:

Created on 2019-01-25 19:36 by rdb, last changed 2019-08-27 19:23 by p-ganssle.

Messages (8)
msg334365 - (view) Author: rdb (rdb) * Date: 2019-01-25 19:36
The fromisoformat() function added in 3.7 is a very welcome addition.  But one quite noticeable absence was the inability to parse Z instead of +00:00 as the timezone suffix.

Its absence is particularly noticeable given how ubiquitous use of Z is in ISO 8601 timestamps on the web; it is also part of the RFC 3339 subset.  In particular, JavaScript produces it in its canonical ISO 8601 format and is therefore quite common in JSON APIs; this would be the only piece missing to parse ISO dates produced by JavaScript correctly.

I realise that the function was not intended to be able to parse *all* timestamps.  But given the triviality of this change, the ubiquity of this particular formatting feature, and the fact that this change is designed in particular for operability with the widely-used JavaScript date format, I don't think this is a slippery slope, and I would personally see no harm in accepting a 'Z' instead of a timezone.

I am happy to follow up with a patch for this, but would first like confirmation that there is any chance that such a change would be accepted.  Thanks for your consideration!
msg334368 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-01-25 20:59
You can see the discussion in bpo-15873 for the full rationale of why "Z" was omitted - to quote from https://bugs.python.org/issue15873#msg307607 :

> We can have further discussion later about what exactly should be supported in Python 3.8,
> but even in the pre-release discussions I'm already seeing pushback about some of the more
> unusual 8601 formats, and it's a *lot* easier to explain (in documentation) that `fromisoformat()`
> is intended to be the inverse of `isoformat()` than it is to explain which variations of ISO 8601
> are and are not supported (fractional minutes? if you're following the standard, the separator has
> to be a T, so what other variations of the standard are allowed?)

With the current implementation, the contract of the function is very simple to explain: datetime.fromisoformat() is the inverse operation of datetime.isoformat(), which is to say that every valid input to datetime.fromisoformat() is a possible output of datetime.isoformat(), and every possible output of datetime.isoformat() is a valid input to datetime.fromisoformat().

With that as the background - fromisoformat() was designed to be a conservative API because scope is a one-way ratchet, and it's better to under-commit than over-commit. We do have the option going forward of widening the scope of the function in a backwards-compatible way. The main problem I see is that I think we should maintain the property that it should be dead simple to explain what a function does, and having to enumerate edge cases is a code smell. So "it is the inverse operation of fromisoformat(), but it also supports specifying using Z for UTC" fails that test in my opinion.

I see a few rational choices here:

1. Supports the full ISO 8601 datetime spec and all outputs from datetime.isoformat() (these inputs mostly but not completely overlap). We would then just have to decide on a simple policy for how to deal with the optional portions of the spec.

2. Support only the rfc3339 standard + the outputs of datetime.isoformat(), with the option to switch to #1 later.

3. Add the ability for `datetime.isoformat()` to output 'Z' instead of `00:00`, which would allow us to support it as an input and also keep the scope of `datetime.fromisoformat` unchanged.

4. Add a separate function (either a classmethod or a bare function) for parsing exactly the ISO 8601 standard, maybe `parse_iso8601`, so both `parse_iso8601` and `fromisoformat` have a clean, rational explanation for what they do.

5. Leave the current scope alone and don't add anything.

5a. Leave the current scope alone and point people in the direction of `dateutil.parser.isoparse` in the documentation.
msg334370 - (view) Author: rdb (rdb) * Date: 2019-01-25 22:15
I'm a fan of "be lenient in what you accept" but I can see your point in not causing confusion about what this method is meant to be used for.

Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit.  From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.

So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for `fromisoformat()` to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".

Does this seem acceptable?  We can always expand further to allow any ISO 8601 timestamp later, but RFC3339 would already make this function immensely more useful.  I really think that parsing RFC3339 dates is a feature Python needs to have in the standard library given their ubiquity on the web.

Alternatively I am happy to consider adding something like a utc=True flag to isoformat(), but I would personally feel reluctant to add any features that I can't think of a solid use case for.
msg334372 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-01-25 22:35
>  I can see your point in not causing confusion about what this method is meant to be used for.

In this case, making it easy to explain what it does is less important than making the scope and contract of the function clear so that we don't have to argue about what should and should not be supported. Having a narrowly-scoped function is also useful for other reasons:

1. The API is clearer - there are no options to configure on this function, if you start supporting a bunch of features, people will inevitably want to turn some of them *off*, because they only want to accept a subset of the valid inputs.

2. The interface to test is clear - we can exhaustively test the entire contract of the function if desired.

3. Development will not get stalled in decision-making about which features to support or how they might interfere with one another.

> From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.

There are other differences, for example a comma can be used in place of a dot as the delimiter for fractional seconds. Looking at the grammar in the RFC, it seems that it might also support datetimes like 2018-W03-D4, but I don't see any mention of that in the text.

> So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for `fromisoformat()` to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".

No, because the isoformat outputs are not a subset of RFC 3339. For example, 2015-01-01T00:00:00 is not a valid RFC 3339 datetime string, nor is 2015-01-01Q00:00:00, but they are valid outputs of datetime.isoformat(). datetime.fromisoformat() also supports fractional seconds on time zone offsets, which is not part of ISO 8601.

> Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit. 

Is there a reason you can't use `dateutil.parser.isoparse`? The contract of that function is to parse any valid ISO8601 datetime, and fromisoformat is adapted from it.
msg334378 - (view) Author: rdb (rdb) * Date: 2019-01-25 23:17
> > From a cursory glance at the RFC3339 spec it looks like the only other change needed to fully support RFC3339 would be to support an arbitrary number of sub-second digits, whereas fromisoformat() currently requires either exactly 3 or 6.
>
> There are other differences, for example a comma can be used in place of a dot as the delimiter for fractional seconds. Looking at the grammar in the RFC, it seems that it might also support datetimes like 2018-W03-D4, but I don't see any mention of that in the text.

I think you're looking at the appendix, which collects the ABNF from
ISO 8601, but this is not part of RFC3339.  The grammar for RFC3339 is
purposefully very restrictive to make parsing it simple.  The comma
for delimiter is in though, good catch; also a trivial change.

> > So, I can bundle this together with a change making it more lenient about the number of decimal places for seconds, and we can change the docs for `fromisoformat()` to be "it accepts any RFC3339 timestamp, including those generated by isoformat()".
>
> No, because the isoformat outputs are not a subset of RFC 3339. For example, 2015-01-01T00:00:00 is not a valid RFC 3339 datetime string, nor is 2015-01-01Q00:00:00, but they are valid outputs of datetime.isoformat(). datetime.fromisoformat() also supports fractional seconds on time zone offsets, which is not part of ISO 8601.

Fair enough (though I'd say "isoformat()" is a misnomer then).  I was
just going by your option #2.  We would change the wording to imply
"supports RFC 3339 or anything produced by isoformat()"

>
> > Because what I'm trying to use it for technically falls outside the intended use, I say it would make the most sense to expand the intended use a bit.
>
> Is there a reason you can't use `dateutil.parser.isoparse`? The contract of that function is to parse any valid ISO8601 datetime, and fromisoformat is adapted from it.

It seems a little odd to need to pull in a third-party library for
this; it seems far more tempting for me to just do
"datetime.fromisoformat(str.replace('Z', '+00:00'))" instead since I
know my dates are produced by a JSON API.

I don't intend to get argumentative about whether supporting RFC3339
belongs in the standard library; that is clearly a decision for the
Python maintainers, and I'm not sure what criteria they follow on
this.  I just find it odd to point people to a third-party library for
parsing a simple but ubiquitous date standard when there are many
modules in the standard library for far more specific use cases.

FWIW, I do think that fromisoformat() is the right function to provide
RFC3339 support.  I don't think users would benefit from having to
choose between several different functions that parse similar but
subtly different date formats; this seems likely to cause confusion.

Thanks for your consideration!
msg334430 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-01-27 16:07
> It seems a little odd to need to pull in a third-party library for this; it seems far more tempting for me to just do "datetime.fromisoformat(str.replace('Z', '+00:00'))" instead since I know my dates are produced by a JSON API.

Yes, this is also a viable solution. Generally speaking, third party libraries are less onerous these days than they have been in the past, and there are many things that are delegated to third party libraries because staying out of the standard library gives more flexibility in release cycles and the APIs don't need to be quite as stable.

> FWIW, I do think that fromisoformat() is the right function to provide RFC3339 support.  I don't think
> users would benefit from having to choose between several different functions that parse similar but
> subtly different date formats; this seems likely to cause confusion.

This is in fact one of the reasons to proceed with caution here, because ISO 8601, RFC 3339 and datetime.isoformat() are three slightly different and in some senses *incompatible* datetime serialization formats. If I had the choice, I would probably either not have named `isoformat` the way it is named, or I would have stuck to the standard, but what's done is done. As it is now, all the "fromX" alternate constructors are simply the inverse operation of the corresponding "X" method. If we make fromisoformat accept the RFC 3339 subset of ISO 8601, people will find it confusing that it doesn't support even some of the most common *other* ISO 8601 formats, considering it's called `fromisoformat` not `fromrfcformat`.

To give you an idea of why this sort of thing is a problem, it's that with each minor change, expanding the scope a little sounds reasonable, but along with that comes maintenance burdens. People start to rely on the specific behavior of the function, and eventually you get into a position where someone asks for a very reasonable expansion of the scope that is incompatible with the way people are already using the function. This leads you to either stop developing the function at some arbitrary point or to start tacking on a configuration API to resolve these incompatibilities.

If instead we design the function from the beginning with a very clear scope, we can also design the configuration API (and the default values) from the beginning as well. I definitely believe there is a place for a function that parses at least the timestamp portions of the ISO 8601 spec in CPython. I think I would prefer it to be a separate function from fromisoformat. I also think that it's worth letting it marinate in dateutil a bit, so that we can get a sense of what works and what doesn't work as a configuration API so that it's at least *easier* for people to select which of the subtly different datetime formats they're intending to parse.
msg349163 - (view) Author: Mark Haase (mehaase) * Date: 2019-08-07 11:07
Defining isoformat() and fromisoformat() as functional inverses is misguided. Indeed, it's not even true:

```
Python 3.7.2 (default, Dec 28 2018, 14:27:11)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = '2019-08-07t10:44:00+00:00'
>>> assert s == datetime.isoformat(datetime.fromisoformat(s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
```

I agree with rdb that not parsing "Z" is inconvenient and counter intuitive. We have the same use case: parsing ISO strings created by JavaScript (or created by systems that interoperate with JavaScript). We have also memorized the same `.replace("Z", "+00:00")` hack, but this feels like a missing battery in the stdlib.

As Paul points out the legacy of isoformat() complicates the situation. A new pair of functions for RFC-3339 sounds reasonable to me, either rfcformat()/fromrfcformat() or more boldly inetformat()/frominetformat(). The contracts for these functions are simple: fromrfcformat() parses RFC-3339 strings, and rfcformat() produces an RFC-3339 string. The docs for the ISO functions should be updated to point towards the RFC-compliant functions.

I'd be willing to work on a PR, but a change of this size probably needs to through python-ideas first?
msg350643 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2019-08-27 19:23
> Defining isoformat() and fromisoformat() as functional inverses is misguided. Indeed, it's not even true:

`isoformat()` is not the inverse of `fromisoformat()`, that doesn't work because there are multiple strings that isoformat() can create from any given datetime. There is, however, only one datetime that is represented by any given datetime (assuming you consider truncation to create a new datetime), so it is fine for fromisoformat() to be the inverse of isoformat().

I have explained the reason that was chosen for the contract in several places (including in this thread), so I won't bother to repeat it. I think from a practical point of view we should eventually grow more generalized ISO 8601 parsing functionality, and the main question is what the API will look like. In dateutil.parser.isoparse, I still haven't figured out a good way to do feature flags.

> I'd be willing to work on a PR, but a change of this size probably needs to through python-ideas first?

I don't think it *needs* to go to python-ideas, though it's probably a good idea to try and work out the optimal API in a post on the discourse ( discuss.python.org ), and the "ideas" category seems like the right one there. Please CC me (pganssle) if you propose modifications to the fromisoformat API on the discourse.
History
Date User Action Args
2019-08-27 19:23:31p-gansslesetmessages: + msg350643
2019-08-27 14:36:37p-gansslelinkissue37962 superseder
2019-08-26 15:29:03hongweipengsetnosy: + hongweipeng
2019-08-07 11:07:54mehaasesetnosy: + mehaase
messages: + msg349163
2019-02-01 18:26:16jwilksetnosy: + jwilk
2019-01-27 16:07:48p-gansslesetmessages: + msg334430
2019-01-25 23:17:55rdbsetmessages: + msg334378
2019-01-25 22:35:52p-gansslesetmessages: + msg334372
2019-01-25 22:15:22rdbsetmessages: + msg334370
2019-01-25 20:59:32p-gansslesetmessages: + msg334368
2019-01-25 19:55:56xtreaksetnosy: + belopolsky, p-ganssle
2019-01-25 19:36:47rdbcreate