email.utils.parsedate_to_datetime() should return None when date cannot be parsed #74866

timb07 · 2017-06-15T23:12:43Z

BPO	30681
Nosy	@warsaw, @ncoghlan, @bitdancer, @ambv, @serhiy-storchaka, @sim0nx, @timb07, @miss-islington
PRs	bpo-30681: Change error handling to return None in case of invalid date #2229 bpo-30681: Support invalid date format or value #2254 bpo-30681: Support invalid date format or value in email Date header #10783 bpo-30681: Support invalid date format or value in email Date header #22090

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-06-15.23:12:43.396>
labels = ['type-bug', 'expert-email', '3.10']
title = 'email.utils.parsedate_to_datetime() should return None when date cannot be parsed'
updated_at = <Date 2020-10-27.07:37:48.458>
user = 'https://github.com/timb07'

bugs.python.org fields:

activity = <Date 2020-10-27.07:37:48.458>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['email']
creation = <Date 2017-06-15.23:12:43.396>
creator = 'timb07'
dependencies = []
files = []
hgrepos = []
issue_num = 30681
keywords = ['patch']
message_count = 18.0
messages = ['296137', '296141', '296153', '296154', '296215', '296226', '296231', '296235', '296361', '301618', '330628', '330648', '376384', '379381', '379412', '379464', '379706', '379742']
nosy_count = 8.0
nosy_names = ['barry', 'ncoghlan', 'r.david.murray', 'lukasz.langa', 'serhiy.storchaka', 'sim0n', 'timb07', 'miss-islington']
pr_nums = ['2229', '2254', '10783', '22090']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue30681'
versions = ['Python 3.10']

timb07 · 2017-06-15T23:12:43Z

Python 3.6 documentation for email.utils.parsedate_to_datetime() says "Performs the same function as parsedate(), but on success returns a datetime." The docs for parsedate() say "If it succeeds in parsing the date...; otherwise None will be returned." By implication, parsedate_to_datetime() should return None when the date can't be parsed.

There are two different failure modes for parsedate_to_datetime():

When _parsedate_tz() fails to parse the date and returns None:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/email/utils.py", line 210, in parsedate_to_datetime
    *dtuple, tz = _parsedate_tz(data)
TypeError: 'NoneType' object is not iterable

When _parsedate_tz() succeeds, but conversion to datetime.datetime fails:

>>> parsedate_to_datetime('Tue, 06 Jun 2017 27:39:33 +0600')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime
    tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
ValueError: hour must be in 0..23

Note that this second case is the one that led me to this issue. I am using the email package to parse spam emails for subsequent analysis, and a certain group of spam emails contain invalid hour fields in their Date header. I don't require the invalid Date header to be converted to a datetime.datetime, but accessing email_message['date'] to access the header value as a string triggers the ValueError exception. I can work around this with a custom email policy, but the observed behaviour does seem to contradict the documented behaviour.

Also, in relation to https://bugs.python.org/issue15925, r.david.murray commented "Oh, and I'm purposely allowing parsedate_to_datetime throw exceptions. I suppose that should be documented, but that's a separate issue." However, no argument for why parsedate_to_datetime throwing exceptions is desired was given.

bitdancer · 2017-06-16T01:25:02Z

The problem is that if it returns None on parse failure, then you can't tell the difference between the header not existing and the date not being parseable. I don't have a solution for this problem. Suggestions welcome. (Note that this is only a problem in the new policy, where the parsing is done automatically; in the compat32 policy you have to apply parsedate yourself, so you can tell the difference between a non-existent header and a failed date parse).

warsaw · 2017-06-16T03:17:23Z

I'm not sure it would be any better, but what about defining something like a DateFormatDefect and returning that?

timb07 · 2017-06-16T03:19:15Z

My proposed solution (in #2229) is two-part:

change parsedate_to_datetime() to return None rather than raising an exception; and
change headerregistry.DateHeader.parse() to check for None being returned from parsedate_to_datetime(), and to add a defect; the datetime attribute is set to None (as if the Date header were missing), but the header still evaluates as a string to the supplied header value.

I'm not sure what the use case is for distinguishing between a missing Date header and an invalid date value, but can't that be distinguished by the different defects added to the header?

In any case, if I'm not fully grasping the context and parsedate_to_datetime() should continue to throw exceptions, then a slightly different modification to DateHeader to catch those exceptions would seem sensible, and would address my use case.

bitdancer · 2017-06-16T17:44:50Z

OK, I think I've reloaded my brain at least partially on this topic.

I think my original reason for having prasedate_to_datetime raise errors is that it was my opinion that that is the way it should work, and that parsedate should work the same way (raise errors, not return None). The logic is that parsedate is not itself part of the *parser* and it is the parser that has a contract to not raise errors but instead register defects. When you call parsedate from your code (that is, not as part of the parser), it ought to raise an error, IMO, and so I made parsedate_to_datetime do that.

I think I understand the logic behind the original behavior: None as the 'error value', thus being consistent with the parser in not raising errors. But I think our understanding of Python best practices has evolved (solidified?) since the days when the parsedate API was designed, and raising errors is better.

*However*, consistency is also important, so if the consensus is that parsedate_to_datetime should parallel the parsedate API, I'm not going to argue with it.

Regardless of that, however, I think your notion, Tim, that the *string* value of a date header with an invalid date in it should be the invalid string is a good one. One can check the validity by inspecting the datetime argument. Regardless of whether errors are reported via None or an exception, the headerregistry method should catch the error and set the value accordingly (to the invalid string on error, to the normalized string if valid).

A couple of notes on the PR you submitted. (1) this change affects only the new policies, so the test should go somewhere in the new tests, not in test_email, which means you don't need to muck with the test support infrastructure in that file. There are already date header tests in test_headerregistry, so add the new test there. (2) I'm moving us away from putting 'test emails' in separate files, so include the text under test in the test file. You only need the date string in the date header test, but you can add your sample (changed to meet Brett's child filter, although I bet any children who will be looking at the python source code will already have seen many such spam emails) to test_inversion (which currently only contains one test message in msg_params, add yours to that list and make it two :)

As for the decision on the return value vs exception, let's see which side Barry comes down on :)

timb07 · 2017-06-17T01:52:50Z

Thanks for the feedback. I've made a new pull request which addresses the points raised.

warsaw · 2017-06-17T03:53:47Z

Thanks for all the great detailed background, and the suggested approaches. I think there are a couple of constraints that would be good to resolve.

parsedate_to_datetime() is documented as "performing the same function as parsedate()" with an explicit difference in the good path return value, but no explicit difference in the bad path. So the implication is pretty strong that it should return None when the date cannot be parsed. Have a consistent API with parsedate() is important, and documented, so I think it's reasonable that the implementation should match.
Clearly, header parsing can't raise exceptions.
It should be easy to tell the difference between a missing Date header and a bogus date header. Yes, this is an important use case. For example, Mailman may do certain things when the Date header is missing (e.g. it could reject the message, or it could clobber the header with the server's time, etc.). Yet if the header exists and is bogus, then you might want to preserve the bogus header for forensic or idempotency reasons.

It seems to me that the way to resolve this is to fix parsedate_to_datetime() so that it returns None on failure, but to add a (new) defect in DateHeader.parse() when that happens, e.g. InvalidDateDefect. Then, as Tim suggestions and it seems like RDM agrees, that the invalid string value be used as the string value of the header in that case.

Thoughts?

timb07 · 2017-06-17T07:44:10Z

I've updated the pull request to incorporate Barry's suggestion of a new defect for this situation, InvalidDateDefect.

bitdancer · 2017-06-19T17:33:23Z

I'll make one argument in favor of retaining the exception, and if that doesn't fly then I agree to the solution and will try to review the PR soon.

The argument is this: if parsedate_to_datetime raises an error, you get information about *why* the date was invalid, which you don't get from a 'None' return. It is my thought that this would be the most useful behavior for the cases where you call it directly (otherwise, why call it directly?)

(And as far as the doc issue goes, you are correct Barry that the current docs don't document the difference in the error case; I noted in another issue that that "should be fixed"...which is only the case now if you agree to my argument above :)

warsaw · 2017-09-07T19:34:06Z

So, while we do have a conflict between consistency and utility, I think @r.david.murry 's last comment has convinced me that raising the exception is more helpful. I think we should do that, fixing the documentation and giving up on the consistency issue.

bitdancer · 2018-11-28T18:23:34Z

Reported again in issue bpo-35342.

The existing PR is close to complete, but needs adjusted for the fact that we want (and want to document) that the utility raises errors (ie: catch the error in the header parser rather than having the utility return None).

timb07 · 2018-11-29T00:52:08Z

I've addressed the points in the last few comments and created a new PR (10783).

sim0nx · 2020-09-04T16:24:46Z

As I think it is still important to have this fixed and considering the original PR was closed, I have created a new PR based on the original one while implementing the requested changes.

#22090

warsaw · 2020-10-22T23:27:10Z

@sim0n - I added a comment to your open PR.

My main question for the rest of the group is whether we can and should backport this. Given the new defect class being introduced, it seems like this should only land in 3.10. Thoughts?

sim0nx · 2020-10-23T08:39:29Z

@barry Thank you for your input on the PR.

From what I understood this PR was nearly ready and only missing a small addition to the documentation which I added. So it took me a bit to go through it all :-).

I actually don't see how *parsedate_to_datetime* would ever return None. It is *_parsedate_tz* which returns None on bogus input, in which case *parsedate_to_datetime* raises a TypeError.
This is also covered in the tests, so those should be fine.

In order to continue I suggest to fix the documentation on *parsedate_to_datetime*, remove the mention of it returning None and replacing it with it possibly returning TypeError in case of an invalid date input.

Does that make sense ?

Regarding the backporting, as a user of this I must admit that it would be much appreciated if this could be backported :-).

warsaw · 2020-10-23T18:42:33Z

Aside: I noticed that on _parseaddr.py:68, there's a bare return. That should really be return None (EIBTI). Can you fix that in your PR?

I think it's confusing to raise both TypeError and ValueError. I suggest we check the None return from _parsedate_tz() and raise ValueError explicitly in that case, avoiding the implicit TypeError on the failing tuple unpack.

+1 on removing the mention of returning None from the documentation. Then with the above, it would document raising ValueError on invalid date input.

As for backporting, I'm nosing Ned and Łukasz to weigh in. Given that the patch is adding a new defect class (which it should), this won't break existing code, but it does mean that existing code would have different semantics depending on the patch version release of 3.9, 3.8, and 3.7. I'm not completely comfortable with that, but let's see what the RMs say. I guess I'm currently -0 on backporting.

miss-islington · 2020-10-27T00:31:14Z

New changeset 303aac8 by Georges Toth in branch 'master':
bpo-30681: Support invalid date format or value in email Date header (GH-22090)
303aac8

serhiy-storchaka · 2020-10-27T07:37:48Z

>>> email.utils.parsedate_to_datetime(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/email/utils.py", line 200, in parsedate_to_datetime
    raise ValueError('Invalid date value or format "%s"' % str(data))
ValueError: Invalid date value or format "None"

First, the date value is None, not "None".

Second, why not just return None? parsedate() can be used in code like:

   parsedata(headers.get('Date'))

None is an expected argument if the header "Date" is absent. parsedate_to_datetime() is not compatible with parsedata() in this case.

It was a regression introduced in bpo-15925 (#60129). Before that parsedate_to_datetime(None) returned None.

…None

serhiy-storchaka · 2022-06-17T15:40:10Z

@warsaw, @bitdancer, what is your opinion? Is it worth to make parsedate_to_datetime(None) returning None, or it is too late for this?

python/cpython#74866

timb07 mannequin added topic-email type-bug An unexpected behavior, bug, or error labels Jun 15, 2017

bitdancer added 3.7 (EOL) end of life 3.8 only security fixes labels Nov 28, 2018

sim0nx mannequin added 3.9 only security fixes 3.10 only security fixes labels Sep 4, 2020

ned-deily removed the 3.7 (EOL) end of life label Oct 26, 2020

warsaw removed 3.8 only security fixes 3.9 only security fixes labels Oct 27, 2020

warsaw closed this as completed Oct 27, 2020

serhiy-storchaka reopened this Oct 27, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 17, 2022

pythonGH-74866: Fix utils.parsedate_to_datetime raising exception on …

25412df

…None

iritkatriel mentioned this issue Jun 17, 2022

GH-74866: Fix utils.parsedate_to_datetime raising exception on None #93945

Closed

iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023

dhvcc added a commit to dhvcc/rss-parser that referenced this issue Feb 15, 2024

Expect TypeError from email.utils

7ee5de4

python/cpython#74866

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

email.utils.parsedate_to_datetime() should return None when date cannot be parsed #74866

email.utils.parsedate_to_datetime() should return None when date cannot be parsed #74866

timb07 mannequin commented Jun 15, 2017

timb07 mannequin commented Jun 15, 2017

bitdancer commented Jun 16, 2017

warsaw commented Jun 16, 2017

timb07 mannequin commented Jun 16, 2017

bitdancer commented Jun 16, 2017

timb07 mannequin commented Jun 17, 2017

warsaw commented Jun 17, 2017

timb07 mannequin commented Jun 17, 2017

bitdancer commented Jun 19, 2017

warsaw commented Sep 7, 2017

bitdancer commented Nov 28, 2018

timb07 mannequin commented Nov 29, 2018

sim0nx mannequin commented Sep 4, 2020

warsaw commented Oct 22, 2020

sim0nx mannequin commented Oct 23, 2020

warsaw commented Oct 23, 2020

miss-islington commented Oct 27, 2020

serhiy-storchaka commented Oct 27, 2020 •

edited

serhiy-storchaka commented Jun 17, 2022

email.utils.parsedate_to_datetime() should return None when date cannot be parsed #74866

email.utils.parsedate_to_datetime() should return None when date cannot be parsed #74866

Comments

timb07 mannequin commented Jun 15, 2017

timb07 mannequin commented Jun 15, 2017

bitdancer commented Jun 16, 2017

warsaw commented Jun 16, 2017

timb07 mannequin commented Jun 16, 2017

bitdancer commented Jun 16, 2017

timb07 mannequin commented Jun 17, 2017

warsaw commented Jun 17, 2017

timb07 mannequin commented Jun 17, 2017

bitdancer commented Jun 19, 2017

warsaw commented Sep 7, 2017

bitdancer commented Nov 28, 2018

timb07 mannequin commented Nov 29, 2018

sim0nx mannequin commented Sep 4, 2020

warsaw commented Oct 22, 2020

sim0nx mannequin commented Oct 23, 2020

warsaw commented Oct 23, 2020

miss-islington commented Oct 27, 2020

serhiy-storchaka commented Oct 27, 2020 • edited

serhiy-storchaka commented Jun 17, 2022

serhiy-storchaka commented Oct 27, 2020 •

edited