Title: Email long headers parsing/serialization
Type: Stage:
Components: email Versions: Python 3.4, Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, mmasztalerczuk, r.david.murray, Константин Волков
Priority: normal Keywords:

Created on 2016-10-17 17:08 by Константин Волков, last changed 2016-10-18 16:49 by r.david.murray.

File name Uploaded Description Edit Константин Волков, 2016-10-17 17:12 Failing example
Messages (7)
msg278820 - (view) Author: Константин Волков (Константин Волков) Date: 2016-10-17 17:08
There is strange thing with long headers serialized, they have \n prefix. Example fails on Python3.4/3.5:

from email.message import Message
from email import message_from_bytes

x = '<147672320775.19544.6718708004153358411@mkren-spb.root.devdomain.local>'
header = 'Message-ID'
msg = Message()
msg[header] = x

data = msg.as_bytes()

msg2 = message_from_bytes(data)
assert msg2[header] == x

MessageID was generated by email.utils.make_msgid function.
msg278821 - (view) Author: Константин Волков (Константин Волков) Date: 2016-10-17 17:10
Something with copy paste.
x = '<147672320775.19544.6718708004153358411@mkren-spb.root.devdomain.local>'
msg278822 - (view) Author: Константин Волков (Константин Волков) Date: 2016-10-17 17:12
Something with inserting long strings here. Its duplicating for some reason.
Adding example as attachment.
msg278823 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-17 17:22
Ah, interesting case.  Both the old folder/parser and the new folder/parser fail, in slightly different ways.  I'll have to add this test case to the tests as I finish rewriting the folder.  Thanks for the report.
msg278894 - (view) Author: Mariusz Masztalerczuk (mmasztalerczuk) * Date: 2016-10-18 15:57
I think that it is not bug. It is just rfc ;) Due to, 

A message consists of header fields, optionally followed by a message
   body.  Lines in a message MUST be a maximum of 998 characters
   excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
   characters excluding the CRLF

Because you have the line with the size more then 78 chars (the header + value), the python is trying to break this line into two. 

Maybe there should be option to increase this value to something more then 78? (because max is 998 due to rfc)
msg278904 - (view) Author: Константин Волков (Константин Волков) Date: 2016-10-18 16:22
But message ID have its own syntax

3.6.4. Identification fields

message-id      =       "Message-ID:" msg-id CRLF
msg-id          =       [CFWS] "<" id-left "@" id-right ">" [CFWS]

3.2.3. Folding white space and comments

However, where CFWS occurs in this standard, it MUST NOT be inserted
   in such a way that any line of a folded header field is made up
   entirely of WSP characters and nothing else.

Its not obvious, but it seems that there must be no CRLF symbol before MessageID.
msg278909 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-10-18 16:49
It is a bug, but it is not a bug that the message-id body gets put on a second line.  The old (compat32) folder introduces an extra space while folding, which then gets preserved when the re-parsing is done.  The new folder (policy=default) folds correctly (putting the id on a separate line), but the parser fails to remove the leading blank from the value when it is parsed.  It should remove the leading blank because that blank "belongs" to the header label (the "Message-Id:" part).  The RFC caution about whitespace only lines applies to whole lines; the first line in the present example is not blank because it has the header label on it.

I also need to add a test with a Message-Id that is in itself longer than 77 characters.  Such a header can't be folded, so it will have to be emitted with a length longer than the default.  (And yes, the default can be changed to any value you like, see Policy.max_line_len).
Date User Action Args
2016-10-18 16:49:07r.david.murraysetmessages: + msg278909
2016-10-18 16:22:45Константин Волковsetmessages: + msg278904
2016-10-18 15:57:06mmasztalerczuksetnosy: + mmasztalerczuk
messages: + msg278894
2016-10-17 17:22:02r.david.murraysetmessages: + msg278823
2016-10-17 17:12:07Константин Волковsetfiles: +

messages: + msg278822
2016-10-17 17:10:23Константин Волковsetmessages: + msg278821
2016-10-17 17:08:59Константин Волковcreate