This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EmailPolicy not followed
Type: Stage: resolved
Components: email Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: barry, bryced, r.david.murray
Priority: normal Keywords:

Created on 2018-07-30 05:32 by bryced, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg322653 - (view) Author: Bryce Drennan (bryced) Date: 2018-07-30 05:32
Starting in 3.6.4, the header flattening ignores the EmailyPolicy.utf8 attribute if a header is longer than maxlen.  I believe this was introduced in https://github.com/python/cpython/pull/4693.  Specifically this part: https://github.com/miss-islington/cpython/blob/8085ac188785ad0301760869a08b83c2945257a4/Lib/email/_header_value_parser.py#L2668-L2673

This causes problems as the dkim-signature header of parsed email messages gets mangled when they are flattened.

It should look like this:

    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1532918961; bh=AwLeVe/FpXHJ9+VNy8QKwz2N5wuNF5ZkyXE3tLVBrFY=; h=Date:From:Reply-To:To:Subject:References:From:Subject; b=rSWZ7vyWIZqflUJS9ysVQvDxeoMxepEqPr/EoVkqpilCP1ryvci6/jCsFe75M2Jr5NJjzg6yJ6Xew8rpq8SMnZeNhTMmCK8jy\r\n WwSamcZ14t0LUZEt30+9Ump0KbPq+WRQK2rM9NnBVhE6pyvANfgsKMqgXlYzAmHk7P8cZ7ztJMSrtOeOr3u5RRNwvYJ+OYHZSFHiQZrPopNDKovVBcAc+6yVBI3YsI1qsgDmoQ/F5NszOLsBit2IkcvWr7z [...]

but instead gets output like this:

    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;\r\n t=1532918961; bh=AwLeVe/FpXHJ9+VNy8QKwz2N5wuNF5ZkyXE3tLVBrFY=;\r\n h=Date:From:Reply-To:To:Subject:References:From:Subject; =?utf-8?q?b=3DrSWZ?=\r\n =?utf-8?q?7vyWIZqflUJS9ysVQvDxeoMxepEqPr/EoVkqpilCP1ryvci6/jCsFe75M2Jr5NJjz?=\r\n =?utf-8?q?g6yJ6Xew8rpq8SMnZeN [...]

Attached is a test that passes in 3.6.3 and fails in 3.6.4.
msg322691 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-30 21:56
I don't see you asserting utf8=True in your example, so I don't see what it has to do with the utf8 flag, since that is False by default.

Maybe you are running up against the default value of refold_header, which is 'long'?
msg322692 - (view) Author: Bryce Drennan (bryced) Date: 2018-07-30 22:06
Yes, utf8 is set to false. Despite that, the dkim-signature header, which contains no unicode characters, is getting filled with ?utf-8?q? values.

My reading of the documentation of the utf8 flag is that headers should not be encoded like this if its set to False.  Its possible I am misunderstanding.
msg322707 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-31 01:33
You are indeed misunderstanding.  The docs say:

If False, follow RFC 5322, supporting non-ASCII characters in headers by encoding them as “encoded words”. If True, follow RFC 6532 and use utf-8 encoding for headers. Messages formatted in this way may be passed to SMTP servers that support the SMTPUTF8 extension (RFC 6531).

That is, when the flag is False, encoded words may be used, which is what the =?utf-8?q?....?= constructs are.  If it is True, those are *not* used, but instead utf8 character encoding is used, which would look on your terminal like the international characters themselves, not encoded stuff.

So, what you are seeing is the DKIM header getting re-encoded using encoded words in order to make it fit in the standard line length for email headers (78 characters max).  The fact that that wasn't happening before was actually a bug in the folder that was fixed by the changeset you cite.

You can get the behavior you want by setting the policy control 'refold_source' to 'none', or changing max_line_length to some value larger than you expect DKIM headers to be (or to None, which means don't fold).

If the standard DKIM headers really are not respecting the standard default email header line length, that is a very sad thing.

I think perhaps the default value of refold_source was a poor choice, and we should have gone with none.  Changing that could be discussed, but since it changes behavior it may be controversial.
msg322708 - (view) Author: Bryce Drennan (bryced) Date: 2018-07-31 01:48
That makes sense. Apologies for my misreading. Thanks for taking time to explain that. 

I think there is still something strange here since it's unnecessarily using encoded words when it could just "fold" without them. My tests with gmail show that it accepts a multi-line dkim-signature headers but does not handle the encoded words syntax.

While not python's job to maintain compatibility with gmail, I suspect many DKIM implementations don't expect encoded words syntax and thus this change could cause many email handling systems to break.

I'll dig in more and open a separate ticket. Thank you again for your time.
msg322709 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-07-31 02:29
Well, it can't fold them and have them fit in the 78 character limit without adding whitespace that isn't in the original headers (unless there's a more subtle bug :)

The email package has the possibility of having special behavior based on the name of the header, so if DKIM headers have special rules, there is the possibility of implementing those special rules.  Basically, you can implement a parser that recognizes dkim headers and represents what parts can legally be folded using the resulting parse tree.  So it may be possible to fix this without changing the refold_source default.  It is also possible to specify that encoded words may not be used in a given header (that's a simple toggle), which may be all that is needed here.
msg322712 - (view) Author: Bryce Drennan (bryced) Date: 2018-07-31 02:38
As far as I can tell in my manual tests with gmail, extra whitespace is fine.  The addition of `=?utf-8?q?` is what trips both gmail and the python dkim library up.

I agree that the paths you propose are viable.  For now my email projects will be pinned to 3.6.3.  

If I find time, the new ticket will perhaps focus just on being able to "fold" without the encoded-words syntax.
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78458
2020-07-31 16:38:01brycedsetfiles: - test_header_folding.py
2018-07-31 02:38:29brycedsetmessages: + msg322712
2018-07-31 02:29:10r.david.murraysetmessages: + msg322709
2018-07-31 01:48:53brycedsetstatus: open -> closed
resolution: not a bug
messages: + msg322708

stage: resolved
2018-07-31 01:33:58r.david.murraysetmessages: + msg322707
2018-07-30 22:06:56brycedsetmessages: + msg322692
2018-07-30 21:56:36r.david.murraysetmessages: + msg322691
2018-07-30 05:32:53brycedcreate