Author r.david.murray
Recipients Cyril Nicodème, barry, jwilk, msapiro, r.david.murray
Date 2018-11-06.19:23:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1541532204.68.0.788709270274.issue34155@psf.upfronthosting.co.za>
In-reply-to
Content
>>> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default)
    >>> m['From'].addresses(Address(display_name='', username='John Doe jdoe', domain='example.com'),)

The new policies have more error recovery for non-RFC compliant addresses than decode_header, but the two agree in this case.  What is happening here is that (1) an unquoted/unencoded '@' is not allowed in a display name (2) if the address is not '<>' quoted, then everything before the @ is the username and (3) in the absence of a comma after the end of the fqdn (which is not allowed to contain blanks) any additional tokens are discarded.

One could argue that we could treat the blank after the FQDN as a "missing comma", and there would be some merit to that argument.  You could also argue that a "<>" quoted string would trump the occurrence of the @ earlier in the token list.  However, the RFC822 grammar is designed to be parsed character by character, so that would not be a typical way for an RFC822 parser to try to do postel-style error recovery.

So, I don't think there is a bug here, but I'd be curious what other email address parsing libraries do, and that could influence whether extensions to the "make a guess when the string doesn't conform to the RFC" code would be acceptable.
History
Date User Action Args
2018-11-06 19:23:24r.david.murraysetrecipients: + r.david.murray, barry, msapiro, jwilk, Cyril Nicodème
2018-11-06 19:23:24r.david.murraysetmessageid: <1541532204.68.0.788709270274.issue34155@psf.upfronthosting.co.za>
2018-11-06 19:23:24r.david.murraylinkissue34155 messages
2018-11-06 19:23:24r.david.murraycreate