This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Quoting issue on header Reply-To and other address headers
Type: security Stage: patch review
Components: email Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Abridbus, barry, drlazor8, r.david.murray, thehesiod
Priority: normal Keywords: patch

Created on 2021-07-14 12:44 by Abridbus, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
Reply-To.py Abridbus, 2021-07-14 12:44
Pull Requests
URL Status Linked Edit
PR 29881 open drlazor8, 2021-12-01 16:29
Messages (12)
msg397478 - (view) Author: Baptiste (Abridbus) Date: 2021-07-14 12:44
Hello,

When using as_string() on a Reply-To header like the following:
msg['Reply-To'] = '"foo Research, Inc. Foofoo BarBar on Summer Special Friday: 0.50 days (2021-02-31)" <catchall@foobar.exchange.com>'

The double quote disappear, which lead to wrong header value

See attached file for example
msg397480 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-07-14 13:47
There is definitely a problem here, though I see a different problem when I run it (AttributeError: 'Group' object has no attribute 'local_part', presumably because of the ':' not getting escaped correctly).  I believe it applies to any address header, not just Reply-To.  Unfortunately I don't have time to investigate the cause, at least right now.  An interesting first step on diagnosing it might be to produce a minimal example: start deleting special characters from inside that quoted string until you find the one (or ones) that is triggering it.
msg397529 - (view) Author: Baptiste (Abridbus) Date: 2021-07-15 08:11
Thanks David,

Here is some other tests I ran
Issuing: 
- msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days (2021-02-31" <catchall@foobar.exchange.com>'

- msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days 20210231   " <catchall@foobar.exchange.com>'

But:
msg['Reply-To'] = '"foo Research Inc Foofoo BarBar on Summer Special Friday 050 days 20210231  " <catchall@foobar.exchange.com>'

worked. It looks more related to the length of the name than the character used.
msg397545 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-07-15 12:35
Forget what I said about my different error, I made a mistake running the test script.

Interesting.  If it is related to the length of the name, then the problem is most likely in the folding algorithm, specifically in what happens when the "display-name" token is wrapped across lines.  And indeed, if we clone the SMTP policy and set the max_line_len to 1000 in your sample script. it renders the header correctly.

The problem here is that the surrounding quotation marks are added by the 'value' property of DisplayName, but that property isn't invoked when handling parts of the display name separately during mulit-line folding.  I was always bothered by the handling of the quotation marks in the part of the parser and folder dealing with quoted strings, but I never hit on a better way to do it.  This, unfortunately, is going to be non-trivial problem to solve.  It is probably going to require an ugly hack in the folding code :(

Really, the handling of quoted strings throughout the _header_value_parser code is...a hack :(  There are probably other places where it breaks down during multi-line folding.  If we are lucky the hack can just add special handling for the quoted-string token type in the folder.  If we aren't it will get messier :(

Glancing at the folder code (it's been a long time since I worked on it), one possible approach (not necessarily the best one) would be to mark the first and last sub-tokens in a quoted-string so that folder knows to put in a leading or trailing quote mark, respectively, during folding.
msg397546 - (view) Author: Julien Castiaux (drlazor8) * Date: 2021-07-15 13:15
Hello David,

I'm working in the same company as Baptiste and I'm trying to solve the problem. The issue is indeed related to the folding algorithm, the DBQUOTE character is lost in the parse_tree AST thus when the folding algo split the children to find a sweat spot to split the line it doesn't re-introduce the DBQUOTE and instead inject the content of the BareQuotedString right away.

I'm working on a fix which consist of adding two DBQUOTE, one at the beginning and one at the end, of the BareQuotedString token when it is created (_header_value_parser.py@get_bare_quoted_string()). I was inspired by how the angles < and > are injected around the AddrSpec token in a AngleAddr token.

Right now my fix isn't correct, there are some unittest falling. I'm trying to get it working and hopefully get back to you with a nice pull-request :)

Regards,
Julien
msg397555 - (view) Author: Julien Castiaux (drlazor8) * Date: 2021-07-15 14:48
Update, it works fine with the compat32 policy
msg397563 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-07-15 16:15
Yes, compat32 uses a different parser and folder (the legacy ones), that have a lot of small bugs relative to the RFCs (which is why I rewrote it).
msg399390 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-11 12:57
I change the issue type to security. The bug can be abused to send emails to the wrong email address.
msg399393 - (view) Author: Julien Castiaux (drlazor8) * Date: 2021-08-11 13:39
Hello David, Victor,

Thank you for the triage, it reminds me about this issue. David, the 
solution I tried last month was wrong, it was breaking (for good 
reasons) tons of unittests. It seems to me that there is indeed no other 
solution than to bloat the re-folding function a bit more and to fix the 
dbquotes there as your last email suggested.

I agree with you that the code will be even messier, honestly I spent 
quite some time understanding the _refold_parse_tree function and I 
don't feel like patching it.

Regards,

On 11.08.21 14:57, STINNER Victor wrote:
> STINNER Victor <vstinner@python.org> added the comment:
>
> I change the issue type to security. The bug can be abused to send emails to the wrong email address.
>
> ----------
> nosy: +vstinner
> type: behavior -> security
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue44637>
> _______________________________________
msg407545 - (view) Author: Alexander Mohr (thehesiod) * Date: 2021-12-02 20:44
btw my work-around was to set maxheaderlen=sys.maxsize, worked for AWS SES at least
msg407901 - (view) Author: Julien Castiaux (drlazor8) * Date: 2021-12-07 08:56
Hello there,

There is a pull-request on github, had to modify `_refold_parse_tree` but I could keep the diff quite small. It is properly tested and it is waiting a review :)

We have a patch at work so it is *absolutely not* urgent, feel free to review it *anytime*. Since we are using the Ubuntu LTS version of python, we might be interested by a backport till 3.7, quite honestly I'm happy it was flag as a security issue :D
msg411492 - (view) Author: Julien Castiaux (drlazor8) * Date: 2022-01-24 16:37
Hello there,

Friendly reminder that this issue is still open and that there is a pull request ready. We continue to face the issue in production and our customers are getting upset.

Can you provide us a schedule when this issue will be addressed? So that we can decide either to wait our to start thinking about possible mitigations our side?

Regards,
Julien
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88803
2022-01-24 16:37:33drlazor8setmessages: + msg411492
2021-12-07 08:56:53drlazor8setmessages: + msg407901
2021-12-02 20:44:28thehesiodsetmessages: + msg407545
2021-12-01 16:29:24drlazor8setkeywords: + patch
stage: patch review
pull_requests: + pull_request28107
2021-11-30 15:19:04vstinnersetnosy: - vstinner
2021-11-30 14:59:41r.david.murraysetnosy: + thehesiod

title: Quoting issue on header Reply-To -> Quoting issue on header Reply-To and other address headers
2021-08-11 13:39:30drlazor8setmessages: + msg399393
2021-08-11 12:57:41vstinnersettype: behavior -> security

messages: + msg399390
nosy: + vstinner
2021-07-15 16:15:18r.david.murraysetmessages: + msg397563
2021-07-15 14:48:35drlazor8setmessages: + msg397555
2021-07-15 13:15:28drlazor8setmessages: + msg397546
2021-07-15 12:35:42r.david.murraysetmessages: + msg397545
2021-07-15 08:11:45Abridbussetmessages: + msg397529
2021-07-14 13:47:17r.david.murraysetmessages: + msg397480
2021-07-14 12:44:57Abridbuscreate