Message163902
Actually, you're right. Sorry for overlooking the RFC. But that said, the RFC itself refers to the same manpage as a reference that's "mostly authoritative for those variations that are otherwise only documented in anecdotal form". So I guess it's quite a good reference after all :)
In Appendix A, RFC 4155 defines a set of rules for a "default" mbox format that maximizes interoperability between different mbox implementations.
The important things in the RFC concerning this issue are:
* There MUST be an empty line after each message.
* The RFC does not specify any escape syntax for message body lines starting with "From ". It says: "Recipient systems are expected to parse full separator lines as they are documented above."
Because the RFC states that there must be an empty line after each message, and it aims for maximum interoperability, I think we can assume that there always is an empty line there. But looking for "\n\nFrom " is not enough for finding the starting points of messages. We should actually parse the whole separator line which consists of "From ", an email address (addr-spec in RFC 2822), a timestamp (in UNIX ctime format without timezone), and a newline character.
I think this should be the default mode for reading mbox files. See #13698 for adding support for other formats. |
|
Date |
User |
Action |
Args |
2012-06-25 06:15:55 | petri.lehtinen | set | recipients:
+ petri.lehtinen, barry, r.david.murray, sdaoden, wally1980 |
2012-06-25 06:15:55 | petri.lehtinen | set | messageid: <1340604955.15.0.808570501197.issue11728@psf.upfronthosting.co.za> |
2012-06-25 06:15:54 | petri.lehtinen | link | issue11728 messages |
2012-06-25 06:15:53 | petri.lehtinen | create | |
|