Title: email modifies the message structure when the parsed email is invalid without registering defects
Components: email Versions: Python 3.3
Status: closed Resolution: fixed
sample.tgz xavierd, 2011-07-07 16:37 python script to reproduce the issue
orig.eml xavierd, 2011-08-12 09:53 email to reproduce the issue xavierd, 2011-08-12 09:54 python script to test the patch
email.patch xavierd, 2011-08-17 13:04
orig.eml xavierd, 2011-08-17 13:05 email without a header/body separator
msg139982 - Author: xavierd Date: 2011-07-07 16:37
the function 'email.message_from_file' modifies the message structure when the parsed is invalid (for example, when a closed boudary is missing). The attribute defects is also empty

In the attachment (sample.tgz) you will find:
   - orig.eml : an email with an invalid structure The boundary
"000101020201080900040301" isn't closed
   - after_parsing.eml: same email after calling email.message_from_file()
The boundary is now closed. And the defects attribute is empty
   - python script to reproduce.
msg141947 - Author: xavierd Date: 2011-08-12 09:52
This patch does: 
 - when a close boundary isn't found then the error 
'email.errors.CloseBoundaryNotFoundDefect' is added to the defects list.
 - it doesn't modify the current behaviour of the feedparser 
(eg: the function email.message_from_file still modifies the message 
msg141948 - Author: xavierd Date: 2011-08-12 09:54
with the patch applied: 

$ ./
defects found !
[<email.errors.CloseBoundaryNotFoundDefect instance at 0x7f41421c0488>]
msg142273 - Author: xavierd Date: 2011-08-17 13:04
I also noticed that 'email' modifies the message structure when the header/body separator is missing. And nothing is added to the defect list.
In the attachment, you'll find : 
 - email.patch: this patch add the following error to the defects list :
   - the error 'email.errors.CloseBoundaryNotFoundDefect' when a boundary isn't closed.
   - the error 'email.errors.MissingHeaderBodySeparator' when the header/body isn't found
(patch for python 2.7.2)
 - a email without a header/body separator
msg161509 - Author: R. David Murray Date: 2012-05-24 14:20
Thanks for the patch.  I haven't forgotten about it, but it will probably still be a while yet before I get to it.  Hopefully before 3.3 is released, though.
msg161750 - Author: Roundup Robot Date: 2012-05-28 02:20
New changeset 81e008f13b4f by R David Murray in branch 'default':
#12515: email now registers a defect if the MIME end boundary is missing.
msg161751 - Author: R. David Murray Date: 2012-05-28 02:23
I didn't wind up using your patch (for one thing I forgot that there were two separate issues in this patch and independently rediscovered and fixed the MissingHeaderBodySeparatorDefect one).  However, this is now fixed in 3.3.  Unfortunately, since it introduces a new defect, it is an enhancement and by our rules can't be backported.
