classification
Title: email 3.0+ stops parsing headers prematurely
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, msapiro, r.david.murray, terry.reedy
Priority: normal Keywords:

Created on 2006-03-06 04:10 by msapiro, last changed 2010-08-06 20:39 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
junk.txt msapiro, 2006-03-06 04:10 Example message
Messages (3)
msg27686 - (view) Author: Mark Sapiro (msapiro) * Date: 2006-03-06 04:10
given the following input:

Received: by main.example.com; Sun Nov  4 02:45:50 2001
X-From_: postmaster@example.com  Sun Nov  4 02:45:50 2001
>From bob  Sun Nov  4 02:45:50 2001
Return-Path: <postmaster@example.com>
Delivered-To: bob@example.com

followed by more headers and message body, the email
3.0+ parser parses everything beginning with the

>From bob  Sun Nov  4 02:45:50 2001

line as the body of the message with only the first two
lines as the header. RFC 2822 is clear that the message
headers are separated from the body by an empty line,
so I think the parser should continue parsing
everything as headers until an empty line or the end of
input is encountered, and should consider lines such as

>From bob  Sun Nov  4 02:45:50 2001

or

Some arbitrary text

encountered in the headers to be MalformedHeaderDefect.
A complete example message is atteched.
msg113047 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 20:38
The email module has several problems. RDM is working on overhauling the email module for 3.2. Existing issues may not get individual attention.
msg113134 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-06 20:39
It's not clear to me that this is a valid bug.  It is true that the RFC says that a blank line preceeds the body.  However, the line in question is not a valid header line.  Mail parsers trying to implement the "be liberal in what you accept" portion of Postel's law should parse messages that where the blank line between the headers and body is missing.  With the input given, there are three valid Postel interpretations: the body starts at the >From line, the >From line is missing a folding indent and is part of the value of the preceding header, and the >From line is garbage and should be discarded.

Since a leading >From is a token that occurs validly with reasonable frequency in message bodies and is never valid in message headers, I think the current choice is a sane one.  A smarter heuristic might look at the subsequent line and note that they look like headers, but headers can occur in the body of messages, so that heuristic would probably be wrong more often than it was right.  Especially considering that putting headers in a message body is the time when you are most likely to see the leading '>From ' token, since it would be quoting the mbox 'From ' header.

So, I'm closing this bug as rejected.  (Rejected rather than invalid, since reasonable people can certainly disagree about the best heuristics for handling invalid data.)
History
Date User Action Args
2010-08-06 20:39:35r.david.murraysetstatus: open -> closed
resolution: rejected
messages: + msg113134

stage: test needed -> resolved
2010-08-05 20:38:57terry.reedysetnosy: + terry.reedy

messages: + msg113047
versions: + Python 3.1, - Python 2.6, Python 3.0
2010-05-05 13:47:53barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-03-21 00:45:47ajaksu2setstage: test needed
type: behavior
versions: + Python 2.6, Python 3.0, - Python 2.4
2006-03-06 04:10:02msapirocreate