Title: Quoting issue on header Reply-To
Created on 2021-07-14 12:44 by Abridbus, last changed 2021-08-11 13:39 by Julien Castiaux.

Author: Baptiste (Abridbus) Date: 2021-07-14 12:44

When using as_string() on a Reply-To header like the following:
msg['Reply-To'] = '"foo Research, Inc. Foofoo BarBar on Summer Special Friday: 0.50 days (2021-02-31)" <>'

The double quote disappear, which lead to wrong header value

See attached file for example
Author: R. David Murray (r.david.murray) Date: 2021-07-14 13:47
There is definitely a problem here, though I see a different problem when I run it (AttributeError: 'Group' object has no attribute 'local_part', presumably because of the ':' not getting escaped correctly).  I believe it applies to any address header, not just Reply-To.  Unfortunately I don't have time to investigate the cause, at least right now.  An interesting first step on diagnosing it might be to produce a minimal example: start deleting special characters from inside that quoted string until you find the one (or ones) that is triggering it.
Author: Baptiste (Abridbus) Date: 2021-07-15 08:11
Thanks David,

Here is some other tests I ran
- msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days (2021-02-31" <>'

- msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days 20210231   " <>'

msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days 20210231  " <>'

worked. It looks more related to the length of the name than the character used.
Author: R. David Murray (r.david.murray) Date: 2021-07-15 12:35
Forget what I said about my different error, I made a mistake running the test script.

Interesting.  If it is related to the length of the name, then the problem is most likely in the folding algorithm, specifically in what happens when the "display-name" token is wrapped across lines.  And indeed, if we clone the SMTP policy and set the max_line_len to 1000 in your sample script. it renders the header correctly.

The problem here is that the surrounding quotation marks are added by the 'value' property of DisplayName, but that property isn't invoked when handling parts of the display name separately during mulit-line folding.  I was always bothered by the handling of the quotation marks in the part of the parser and folder dealing with quoted strings, but I never hit on a better way to do it.  This, unfortunately, is going to be non-trivial problem to solve.  It is probably going to require an ugly hack in the folding code :(

Really, the handling of quoted strings throughout the _header_value_parser code is...a hack :(  There are probably other places where it breaks down during multi-line folding.  If we are lucky the hack can just add special handling for the quoted-string token type in the folder.  If we aren't it will get messier :(

Glancing at the folder code (it's been a long time since I worked on it), one possible approach (not necessarily the best one) would be to mark the first and last sub-tokens in a quoted-string so that folder knows to put in a leading or trailing quote mark, respectively, during folding.
Author: Julien Castiaux (Julien Castiaux) Date: 2021-07-15 13:15
Hello David,

I'm working in the same company as Baptiste and I'm trying to solve the problem. The issue is indeed related to the folding algorithm, the DBQUOTE character is lost in the parse_tree AST thus when the folding algo split the children to find a sweat spot to split the line it doesn't re-introduce the DBQUOTE and instead inject the content of the BareQuotedString right away.

I'm working on a fix which consist of adding two DBQUOTE, one at the beginning and one at the end, of the BareQuotedString token when it is created ( I was inspired by how the angles < and > are injected around the AddrSpec token in a AngleAddr token.

Right now my fix isn't correct, there are some unittest falling. I'm trying to get it working and hopefully get back to you with a nice pull-request :)

Author: Julien Castiaux (Julien Castiaux) Date: 2021-07-15 14:48
Update, it works fine with the compat32 policy
Author: R. David Murray (r.david.murray) Date: 2021-07-15 16:15
Yes, compat32 uses a different parser and folder (the legacy ones), that have a lot of small bugs relative to the RFCs (which is why I rewrote it).
Author: STINNER Victor (vstinner) Date: 2021-08-11 12:57
I change the issue type to security. The bug can be abused to send emails to the wrong email address.
Author: Julien Castiaux (Julien Castiaux) Date: 2021-08-11 13:39
Hello David, Victor,

Thank you for the triage, it reminds me about this issue. David, the 
solution I tried last month was wrong, it was breaking (for good 
reasons) tons of unittests. It seems to me that there is indeed no other 
solution than to bloat the re-folding function a bit more and to fix the 
dbquotes there as your last email suggested.

I agree with you that the code will be even messier, honestly I spent 
quite some time understanding the _refold_parse_tree function and I 
don't feel like patching it.


