classification
Title: unnecessary encoded-words usage breaks DKIM signatures
Type: Stage:
Components: email Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, bryced, maxking, r.david.murray
Priority: normal Keywords:

Created on 2018-10-03 07:06 by bryced, last changed 2019-05-17 18:35 by maxking.

Messages (4)
msg326943 - (view) Author: Bryce Drennan (bryced) Date: 2018-10-03 07:06
Since Python 3.6.4 folding of unstructured headers uses the encoded words syntax even if there are no special characters.  

This makes DKIM-Signature headers that are unreadable to google's gmail servers. It may be that encoded-words are not valid in this header.  I don't see them mentioned here: https://tools.ietf.org/html/rfc6376#page-8

Here is the smallest test case I could create to demonstrate the issue.

One solution would be to add DKIM-Signature to the HeaderRegistry but I'm not yet expert enough to execute this. I went down that path for a few hours. Didn't see a straight-forward way to disable encoded words.

Setting EmailPolicy(max_line_length=None) does output without encoded words but I worry that will cause different incompatibility issues.


from email.headerregistry import HeaderRegistry
from email.policy import SMTP

def test_unstructured_encoded_word_folding():
    header = HeaderRegistry()('DKIM-Signature', 'a' * 85)
    folded = header.fold(policy=SMTP.clone(refold_source=None))
    print(f'\nDKIM-Signature: {header}')
    print(folded)
    assert '=?utf-8?q?' not in folded


Output:

DKIM-Signature: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
DKIM-Signature: =?utf-8?q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?utf-8?q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=

AssertionError()!
msg326996 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-10-03 17:51
See also issue 34277 for a previous discussion.

I thought I had included a header-level toggle for encoded words, but that doesn't actually make sense, since by default a header value is treated as unstructured (which means encoded words are allowed).

To implement this you need to add a new TokenList class to _header_value_parser, say DKIMHeaderValue, with the class attribute 'as_ew_allowed' set to False.

Do you have a BNF description of the DKIM header?  I could probably sketch the implementation of the DKIM header value parser from that.

Then you create a header in headerregistry that uses that parser as its value_parser.
msg326998 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-10-03 17:59
You could also play with just making a parser that is a simplified version of get_unstructured, producing a....maybe call it ASCIIOnlyUnstructuredTokenList...that would have as_ew_allowed set to False.  That might not produce optimal results, but it would be better than the current situation.
msg342754 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-05-17 18:35
Just for reference DKIM-Signature header is defined in RFC 6376 and the BNF description for the header is mentioned here (https://tools.ietf.org/html/rfc6376#section-3.5).

It is a bit long so I am not copy-pasting it here.

I might take a stab at writing a value_parser for this, but after I can write some simple ones. This seems to be overly complicated with many (optional/required) fields
History
Date User Action Args
2019-05-17 18:35:26maxkingsetnosy: + maxking
messages: + msg342754
2018-10-03 17:59:47r.david.murraysetmessages: + msg326998
2018-10-03 17:51:47r.david.murraysetmessages: + msg326996
2018-10-03 07:06:28brycedcreate