classification
Title: SMTP.send_message() does from mangling when it should not
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: andrei.avk, grant.b.edwards, r.david.murray
Priority: normal Keywords:

Created on 2021-09-27 05:02 by grant.b.edwards, last changed 2021-11-25 17:03 by andrei.avk.

Messages (8)
msg402690 - (view) Author: Grant Edwards (grant.b.edwards) Date: 2021-09-27 05:02
SMTP.send_message() does from mangling even when the message's policy
has that disabled. The problem is in the send_messsage() function
shown below:


   912      def send_message(self, msg, from_addr=None, to_addrs=None,
   913                       mail_options=(), rcpt_options=()):
   914          """Converts message to a bytestring and passes it to sendmail.
   ...
   963          # Make a local copy so we can delete the bcc headers.
   964          msg_copy = copy.copy(msg)
   ...
   977          with io.BytesIO() as bytesmsg:
   978              if international:
   979                  g = email.generator.BytesGenerator(
   980                      bytesmsg, policy=msg.policy.clone(utf8=True))
   981                  mail_options = (*mail_options, 'SMTPUTF8', 'BODY=8BITMIME')
   982              else:
   983                  g = email.generator.BytesGenerator(bytesmsg)
   984              g.flatten(msg_copy, linesep='\r\n')
   985              flatmsg = bytesmsg.getvalue()

If 'international' is True, then the BytesGenerator is passed
msg.policy with utf8 added, and from mangling is only done if the
message's policy has from mangling enabled. This is correct behavior.

If 'international' is False, then the generator always does from
mangling regardless of the message'spolicy. From mangling is only used
when writing message to mbox format files. When sending a message via
SMTP It is wrong to do it when the message's policy says not to.

This needs to be fixed by passing the message's policy to BytesGenerator()
in the 'else' clause also.  I would suggest changing

   983                  g = email.generator.BytesGenerator(bytesmsg)
to

   983                  g = email.generator.BytesGenerator(bytesmsg, policy=msg.policy)


The problem is that if neither mangle_from_ nor policy arguments are
passed to email.generator.BytesGenerator(), then mangle_from_ defaults
to True, and the mangle_from_ setting in the message is ignored. This is
not really documented:

  https://docs.python.org/3/library/email.generator.html#email.generator.BytesGenerator

    If optional mangle_from_ is True, put a > character in front of
    any line in the body that starts with the exact string "From ",
    that is From followed by a space at the beginning of a
    line. mangle_from_ defaults to the value of the mangle_from_
    setting of the policy.
   
Where "the policy" refers to the one passed to BytesGenerator(). Note
that this section does _not_ state what happens if neither
mangle_from_ nor policy are passed to BytesGenerator(). What actually
happens is that in that case mangle_from_ defaults to True. (Not a
good choice for default, since that's only useful for the one case
where you're writing to an mbox file.)

However, there is also a misleading sentence later in that same
section:

    If policy is None (the default), use the policy associated with
    the Message or EmailMessage object passed to flatten to control
    the message generation.

That's only partially true. If you don't pass a policy to
BytesGenerator(), only _some_ of the settings from the message's
policy will be used. Some policy settings (e.g. mangle_from_) are
obeyed when passed to BytesGenerator(), but ignored in the message's
policy even if there was no policy passed to BytesGenerator().

The documentation needs to be changed to state that mangle_from_
defaults to True if neitehr mangle_from_ nor policy are passed to
BytesGenerator(), and that last sentence needs to be changed to state
that when no policy is passed to BytesGenerator() only _some_ of the
fields in the message's policy are used to control the message
generation. (An actual list of policy fields are obeyed from the
message's policy would be nice.)
msg406899 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-11-23 23:19
Seems like a reasonable request to me.

I can make the PR+test.

To minimize backwards-incompatible change, we can pass `_mangle_from=policy._mangle_from` argument instead of passing the entire policy. Is that a good idea or passing the policy argument should be fine too?
msg406900 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-11-24 01:46
In this case the docs are correct and the code has a bug.  The intent was that if the message passed in to BytesGenerator has a policy, that policy should be followed.  If it is not being followed, that's a bug in BytesGenerator.  The tricky part of course is backward compatibility.  Is there code out there depending on this bug?  Anyone want to hazard a guess?

Are there things other than mangle_from_ that are being ignored? 

If we decide it is too risky to fix in BytesGenerator (or maybe only to fix it in a feature release), then I'd pass the whole policy in the else clause, with a comment about what bug it is working around.
msg406903 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-11-24 05:58
There are 3 policy settings that are also being passed as parameters to Generator/BytesGenerator:

- mangle_from_
- linesep
- maxheaderlen

Both linesep and maxheaderlen are being inserted into policy after self.policy (if there is one) overrides message policy.

The docs for both linesep and maxheaderlen match the code.

So this only leaves us the buggy `mangle_from_`.

I think it makes sense to fix it in `send_message()` rather than in BytesGenerator because:

- less backwards compatibility breakage
- there's already a workaround for BytesGenerator (provide the parameter)
- there were no reports from users of BytesGenerator

We have two ways to fix it in send_message() --
- provide policy as an arg
- provide mangle_from_ as an arg

I think the 2nd choice is better because it's more direct and easier to understand. If we use msg.policy as an arg, it looks like we're using msg.policy to override msg.policy, which wouldn't make any sense except that mangle_from_ is being set from policy arg rather than msg.policy.

If there's code out there that relies on this bug in send_message(), I would guess it's more likely to be test suites that compare output to version where *from* is mangled.

Docs for BytesGenerator should be fixed to warn about this issue.
msg406941 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-11-24 17:44
Your backward compatibility argument is persuasive.  As you say, that means the BytesGenerate docs would need to be updated to note that that parameter is the exception to the rule for backward compatibility reasons.  (If it is the only exception I have to wonder if I had a backward compatibility reason for doing it that way in the first place and just forgot to document it.  It is too long ago to remember.  It is even possible that effectively changing the default broke mbox and that's why it is an exception :)

As for the send_message change, if mangle_from_ is the only exception then I think just passing it does make sense, maybe with a comment referencing the BytesGenerator docs for mangle_from_ to explain why it is needed.
msg407000 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-11-25 14:44
R. David: `mangle_from_` is the only exception; I agree it seems likely it was done this way for backwards compatibility.

Grant: do you agree with the fix to logic? Also do you agree that mangle_from_ is the only setting that's not being applied to msg generation from message policy? Would you like to work on the PR? If not, I can create the PR.
msg407007 - (view) Author: Grant Edwards (grant.b.edwards) Date: 2021-11-25 16:18
Yes, passing the mangle_from value to BytesGenerator seems like the safest fix. It's unfortunate that BytesGenerator defaults to doing "the wrong thing" in the absence of a policy argument, but there might be code that depends on it.

I've never done a PR before, but I could work on it next week.
msg407008 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-11-25 17:03
Grant: sounds good! I can do the initial PR review. Note that the PR will need a test and a news entry. See the link below on authoring PRs:

https://devguide.python.org/pullrequest/
History
Date User Action Args
2021-11-25 17:03:27andrei.avksetmessages: + msg407008
2021-11-25 16:18:39grant.b.edwardssetmessages: + msg407007
2021-11-25 14:44:30andrei.avksetmessages: + msg407000
2021-11-24 17:44:28r.david.murraysetmessages: + msg406941
2021-11-24 05:58:44andrei.avksetmessages: + msg406903
2021-11-24 01:46:58r.david.murraysetmessages: + msg406900
2021-11-23 23:19:55andrei.avksetversions: + Python 3.10, Python 3.11
nosy: + r.david.murray, andrei.avk

messages: + msg406899

type: behavior
stage: needs patch
2021-09-27 05:02:29grant.b.edwardscreate