classification
Title: email.message_from_string() is unable to find the headers for the .msg files
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, jpatel, maxking, r.david.murray
Priority: normal Keywords:

Created on 2020-06-29 10:51 by jpatel, last changed 2020-07-02 04:57 by SilentGhost.

Files
File name Uploaded Description Edit
extract_mail.py jpatel, 2020-06-29 10:51
msgfile_not_working_correctly.msg jpatel, 2020-06-29 10:51
Messages (1)
msg372560 - (view) Author: Jay Patel (jpatel) Date: 2020-06-29 10:51
I need to extract the email data from an MSG file on Python v2.7. But as Python v2.7 has been deprecated, I tried to replicate this scenario on Python v3.8 and faced the same issue.
I am trying to extract the message using the "Message" class of the "extract_msg" module. After extracting the text from the "Message" object, I am using email.message_from_string() method to separate the headers and the body (or payload). The same workflow can be observed in the "extract_mail.py" file.
The issue with the attached file, "msgfile_not_working_correctly.msg", is that the headers of this file begin with "Microsoft Mail Internet Headers Version 2.0" which is interpreted as body and not as headers (as it is not in the standard email headers format like "To": "receiver@gmail.com").
According to this (https://support.microsoft.com/en-us/office/view-internet-message-headers-in-outlook-cd039382-dc6e-4264-ac74-c048563d212c) link the message headers in Outlook will begin with "Microsoft Mail Internet Headers Version 2.0" which is added by Outlook (mentioned in the "Interpreting email headers" section of the mentioned link). 
The email data can be observed in the "email_data.txt" file.

I have tried omitting the first line, when there are no headers and it works as per the expectation. Can this scenario be handled at the modular level (email module) or is there any other way to extract headers for the .msg files.
History
Date User Action Args
2020-07-02 04:57:07SilentGhostsetcomponents: + Library (Lib)
2020-07-02 04:56:52SilentGhostsetnosy: + barry, r.david.murray, maxking
2020-06-29 10:51:34jpatelsetfiles: + msgfile_not_working_correctly.msg
2020-06-29 10:51:06jpatelcreate