This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: email: folding of quoted string in display_name violates RFC
Type: behavior Stage: patch review
Components: email Versions: Python 3.7, Python 3.6, Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aaryn.startmail, barry, r.david.murray, ronaldevers
Priority: normal Keywords: patch

Created on 2019-02-19 16:45 by aaryn.startmail, last changed 2022-04-11 14:59 by admin.

File name Uploaded Description Edit aaryn.startmail, 2019-02-19 16:45 Illustration of issue, monkey patch fix of issue, illustration of fix aaryn.startmail, 2019-02-20 08:32
Pull Requests
URL Status Linked Edit
PR 12054 open python-dev, 2019-02-26 13:27
Messages (7)
msg335975 - (view) Author: Aaryn Tonita (aaryn.startmail) * Date: 2019-02-19 16:45
When using a policy for an EmailMessage that triggers folding (during serialization) of a fairly long display_name in an address field, the folding process removes the quotes from the display name breaking the semantics of the field.

In particular, for a From address's display name like r' ' + 'a' * MAX_LINE_LEN the folding puts unquoted immediately after the From: header. For applications that do sender verification inside and then send it to an internal SMTP server that does not perform its own sender verification this could be considered a security issue since it enables sender spoofing. Receiving mail servers might be able to detect the broken header, but experiments show that the mail gets delivered.

Simple demonstration (reproduced in attachment) of issue:

SMTP_POLICY = email.policy.default.clone(linesep='\r\n', max_line_length=72)
address = Address(display_name=r' ' + 'a' * 72, addr_spec='')

message = EmailMessage(policy=SMTP_POLICY)
message['From'] = Address(display_name=display_name, addr_spec=addr_spec)

# Trigger folding (via as_string()), then parse it back in.
msg_string = message.as_string()
msg_bytes = msg_string.encode('utf-8')
msg_deserialized = BytesParser(policy=SMTP_POLICY).parsebytes(msg_bytes)

# Verify badness
from_hdr = msg_deserialized['From']
assert from_hdr != str(address)  # But they should be equal...
msg336010 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-02-19 19:55
Since Address itself renders it correctly (str(address)), the problem is going to take a bit of digging to find.  I'm guessing the quoted_string atom is getting transformed incorrectly into something else at some point during the folding.
msg336049 - (view) Author: Aaryn Tonita (aaryn.startmail) * Date: 2019-02-20 08:32
Hi David, the problem is in email._header_value_parser._refold_parse_tree.

Specifically, when the parsetree renders too long, it recursively gets newparts = list(part) (the children). When it does this to a BareQuotedString node, the child nodes are unquoted and unescaped and it just happily serializes these.

I thought I had attached a file that monkey patches the _refold_parse_tree function with a fixed version... let me try again.
msg336093 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2019-02-20 14:40
I'm afraid I don't have time to parse through the file you uploaded.  Can you produce a pull request or a diff showing your fix?  And ideally some added tests :)  But whatever you can do is great, if you don't have time maybe someone else will pick it up (I unfortunately don't have time, though I should be able to do a review of a PR).
msg336109 - (view) Author: Aaryn Tonita (aaryn.startmail) * Date: 2019-02-20 16:07
Sure thing, I'll try to produce something tomorrow.
msg336667 - (view) Author: Aaryn Tonita (aaryn.startmail) * Date: 2019-02-26 13:34
Sorry about the delay. I opened pull request for this. Let me know if you need anything else.
msg336668 - (view) Author: Aaryn Tonita (aaryn.startmail) * Date: 2019-02-26 13:36
Although I am not personally interested in backporting a fix for this issue, anyone that experiences this issue in python 3.5 can execute the following monkey patch to solve the issue:

def _fix_issue_36041_3_5():
    from email._header_value_parser import QuotedString, ValueTerminal, quote_string
    import email._header_value_parser

    class BareQuotedString(QuotedString):

        token_type = 'bare-quoted-string'

        def __str__(self):
            return quote_string(''.join(str(x) for x in self))

        def value(self):
            return ''.join(str(x) for x in self)

        def parts(self):
            parts = list(self)
            escaped_parts = []
            for part in parts:
                if isinstance(part, ValueTerminal):
                    escaped = quote_string(str(part))[1:-1]
                    escaped_parts.append(ValueTerminal(escaped, 'ptext'))
            # Add quotes to the first and last parts.
            escaped_parts[0] = ValueTerminal('"' + str(escaped_parts[0]), 'ptext')
            escaped_parts[-1] = ValueTerminal(str(escaped_parts[-1] + '"'), 'ptext')
            return escaped_parts

    email._header_value_parser.BareQuotedString = BareQuotedString
Date User Action Args
2022-04-11 14:59:11adminsetgithub: 80222
2019-11-20 08:30:29ronaldeverssetnosy: + ronaldevers
2019-02-26 13:36:11aaryn.startmailsetmessages: + msg336668
2019-02-26 13:34:33aaryn.startmailsetmessages: + msg336667
2019-02-26 13:27:03python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request12080
2019-02-20 16:07:20aaryn.startmailsetmessages: + msg336109
2019-02-20 14:40:09r.david.murraysetmessages: + msg336093
2019-02-20 08:32:07aaryn.startmailsetfiles: +

messages: + msg336049
2019-02-19 19:55:24r.david.murraysetmessages: + msg336010
2019-02-19 16:45:39aaryn.startmailcreate