This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author p-ganssle
Recipients cool-RR, methane, p-ganssle, steven.daprano, vstinner
Date 2020-01-10.15:15:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1578669351.32.0.699821128583.issue39280@roundup.psfhosted.org>
In-reply-to
Content
> Yes, but not within the same format. If someone were to choose the format '2014-04-10T24:00:00', they would have a reasonable expectation that there is only one unique string that corresponds with that datetime

That's a particularly bad example, because it's exactly the same as another string with the exact same format:

  2014-04-11T00:00:00

Since ISO 8601 allows you to specify midnight (and only midnight) using previous day + 24:00. Admittedly, that is the only ambiguity I know of offhand (though it's a huge spec) *for a given format*, but also ISO 8601 does not really have a concept of format specifiers, so it's not like there's a way to unambiguously specify the format you are intending to use.

Either way, I think we can explicitly dispense with "there will be an exact mapping between a given (format_str, datetime_str) pair and the datetime it produces" as a goal here. I can't think of any good reason you'd want that property, nor have we made any indication that I can see that we provide it (probably the opposite, since there are some formats that explicitly ignore whitespace).

> Okay, since it seems like I'm the only one who wants this change, I'll let it go. Thanks for your input.

I wouldn't go that far. I think I am +0 or +1 on this change, I just wanted to be absolutely clear *why* we're doing this. I don't want someone pointing at this thread in the future and saying, "Core dev says that it's a bug in their code if they don't follow X standard / if more than one string produces the same datetime / etc".

I think the strongest argument for making this or a similar change is that I'm fairly certain that we don't have the bandwidth to handle internationalized dates and I don't think we have much to gain by doing a sort of half-assed version of that by accepting unicode transliterations of numerals and calling it a day. I think there are tons of edge cases here that could bite people, and if we don't support this *now* I'd rather give people an error message early in the process and try to point people at a library that is designed to handle datetime localization issues. If all we're going to do is switch [0-9] to \d (which won't work for the places where it's actually [1-9], mind you), I think people will get a better version of that with something like:

  def normalize_dt_str(dt_str):
      return "".join(str(int(x)) if x.isdigit() else x
                     for x in dt_str)

There are probably more robust and/or faster versions of this, but it's probably roughly equivalent to what we'd be doing here *anyway*, and at least people would have to opt-in to this.

I am definitely open to us supporting non-ASCII digits in strptime if it would be useful at the level of support we could provide, but given that it's currently broken for any reasonable use case and as far as I know no one has complained, we're better off resolving the inconsistency by requiring ASCII digits and considering non-ASCII support to be a separate feature request.

CC-ing Inada on this as unicode guru and because he might have some intuition about how useful non-ASCII support might be. The only place I've seen non-ASCII dates is in Japanese graveyards, and those tend to use Chinese numerals (which don't match \d anyway), though Japanese and Korean also tends to make heavier use of "full-width numerals" block, so maybe parsing something like "2020-02-02" is an actual pain point that would be improved by this change (though, again, I suspect that this is just the beginning of the required changes and we may never get a decent implementation that supports unicode numerals).
History
Date User Action Args
2020-01-10 15:15:51p-gansslesetrecipients: + p-ganssle, vstinner, steven.daprano, methane, cool-RR
2020-01-10 15:15:51p-gansslesetmessageid: <1578669351.32.0.699821128583.issue39280@roundup.psfhosted.org>
2020-01-10 15:15:51p-gansslelinkissue39280 messages
2020-01-10 15:15:50p-gansslecreate