This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.generator should always add newlines after closing boundaries
Type: behavior Stage: resolved
Components: email Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, dkg, mitya57, python-dev, r.david.murray, sjt
Priority: normal Keywords: patch

Created on 2012-06-02 06:52 by mitya57, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
always_add_newlines.patch mitya57, 2012-06-02 06:52 PATCH: Always add newline after ending boundary in multipart messages review
Messages (17)
msg162126 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012-06-02 06:52
Trying to write a email-sending script with PGP-signing functionality, I stumbled upon a problem (see [1]): it was impossible to sign mutlipart emails (actually the signing was performed, but the verifying programs thought that the signature is bad).

After comparing messages produced by email.generator and popular mail clients (Evolution, KMail), I've found out that the mail clients always add line breaks after ending boundaries.

The attached patch makes email.generator behave like all email clients. After applying it, it's possible to sign even complicated mails like "multipart/alternate with attachments".

An illustration:

 --====1==     # Part 1 (base message) begin
 ...
 --====2==     # Part 1.1 begin
 ...
 --====2==     # Part 1.2 begin
 ...
 --====2==--   # Part 1 end
               # There should be empty line here
 --====1==     # Part 2 (signature) begin
 ...
 --====1==--   # End of the message

[1]: http://stackoverflow.com/questions/10496902/pgp-signing-multipart-e-mails-with-python
msg162136 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 14:48
Then either the signer or the verifier (or both) are broken per RFC 2046 (unless there has been an update that isn't referenced from the RFC).  Section http://tools.ietf.org/html/rfc2046#section-5.1.1 clearly indicates that the ending delimiter ends at the trailing --.  In particular, *transports* are allowed to add whitespace before the trailing CRLF, and receivers are to handle this case.

You might want to report bugs about that to the appropriate projects.

That said, there seems to be little harm in adding a CRLF, and some argument in its favor, since the RFC also says that the CRLF before the next boundary is conceptually part of it, and by that interpretation a CRLF could be considered missing (though I think technically it isn't...the reason for the rule is to disambiguate preceding *body* parts that end or don't end in a CRLF, and here we are dealing with an unambiguous ending boundary delimiter).

We'll need some tests for this.
msg162138 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 14:55
Looking at your stackoverflow post, you might be able to fix this by doing an rstrip on the string body before signing it.  But then if we add a CRLF between the boundaries, the verifiers might start failing again.  What do you think?  Probably adding the CRLF is the way to go, but I wonder, what are the controlling RFCs for signing/verifying and what do they say?
msg162148 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012-06-02 16:50
> Looking at your stackoverflow post, you might be able to fix this by doing an rstrip on the string body before signing it.

My body doesn't end with \n, so that doesn't help. If you suggest me any (easy) way to fix this on the level of my script, I will be happy.

> Probably adding the CRLF is the way to go, but I wonder, what are the controlling RFCs for signing/verifying and what do they say?

The standard is RFC 1847, it doesn't say anything about multipart emails. I've just looked at what other mail clients do, it works, but I can't say anything about its correctness. Also, looking at some multipart signed emails in my inbox, they *all* have the blank line before the signature part.
msg162149 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 16:55
Sorry, I wasn't clear.  By 'body' I actually meant the multipart part you are signing.  I haven't looked at your script really, but I was thinking of something along the lines of make_sig(str(fullmsg.get_payload(0)).rstrip()).  But like I said I didn't actually look at your script, so that may not be applicable.
msg162151 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012-06-02 17:24
> By 'body' I actually meant the multipart part you are signing.

Yes, I've understood you, and I mean the same :) The signature is created against the not-ending-with-newline string, in any case.
msg162163 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-02 18:47
Hmm.  So that means the verifiers are not paying attention to the MIME RFC?  That's unfortunate.
msg162210 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012-06-03 13:45
> Hmm.  So that means the verifiers are not paying attention to the MIME > RFC?  That's unfortunate.

It seems that's true...
msg198213 - (view) Author: Stephen J. Turnbull (sjt) * (Python triager) Date: 2013-09-21 18:03
Seems this hasn't been resolved.  I have to disagree with David's interpretation of RFC 2046.  The definition of a boundary says that it is "terminated" with a CRLF.  It also clarifies that the introducing CRLF is "conceptually part" of the boundary.  Thus each boundary contains both the leading and the trailing CRLF.  There is no exception for the final boundary that I can see.

This implies that when two boundaries abut, they need to be separated by *two* CRLFs, the trailing CRLF on the ending boundary of the inner multipart and the leading CRLF on the next boundary (which might be a separator or the ending boundary) of the containing multipart.
msg198214 - (view) Author: Daniel Kahn Gillmor (dkg) * Date: 2013-09-21 18:20
I think the relevant specification for PGP/MIME-signed messages is RFC 3156:

 https://tools.ietf.org/html/rfc3156#page-5

in particular:

      Note: The accepted OpenPGP convention is for signed data to end
      with a <CR><LF> sequence.  Note that the <CR><LF> sequence
      immediately preceding a MIME boundary delimiter line is considered
      to be part of the delimiter in [3], 5.1.  Thus, it is not part of
      the signed data preceding the delimiter line.  An implementation
      which elects to adhere to the OpenPGP convention has to make sure
      it inserts a <CR><LF> pair on the last line of the data to be
      signed and transmitted (signed message and transmitted message
      MUST be identical).
msg198215 - (view) Author: Stephen J. Turnbull (sjt) * (Python triager) Date: 2013-09-21 19:19
Following OpenPGP convention is clearly optional (or maybe a SHOULD, but the word "elect" makes it a pretty weak SHOULD).  RFC 2046 is a MUST, it's not a matter of "convention".

The problem is that a parser that works forward in the message will swallow the terminating CRLF of the boundary of the signed multipart, and then not find a CRLF to introduce the boundary that separates the content from the signature.  By MIME rules it will treat the signature (including the unrecognized boundary) as an epilogue, and ignore it.  This is not at all special to multipart/signed.
msg198216 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-21 19:21
Well, there are two problems here, I think (it's been a while since I looked at this): we should indeed be adding a crlf between mime boundary lines.  But also the clients should technically be handling it not being there, as well as the case of extra whitespace being added between the end of the delimiter and the following crlf (which I'm guessing they don't, since the logical way to handle that would be to ignore anything after the end of the delimiter string, in which case this problem wouldn't arise).

Specifically:

   The boundary may be followed by zero or more characters of linear
   whitespace. It is then terminated by either another CRLF...

and:

   NOTE TO IMPLEMENTORS:  Boundary string comparisons must compare the
   boundary value with the beginning of each candidate line.  An exact
   match of the entire candidate line is not required; it is sufficient
   that the boundary appear in its entirety following the CRLF.

So it seems to me that the PGP spec is not RFC conformant...but if we were following the RFC it wouldn't be an issue.
msg198218 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-21 19:23
Heh, rather than "not conformant" I should have said that the two RFCs are in conflict, in my opinion.
msg198219 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-21 19:27
Stephen: my post crossed yours.  Yes, I agree with your logic, having re-read the spec (the trailing CR is clearly part of the boundary).  But I still think the logic of the signing/validation is an invitation for running into problems like this.
msg210698 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-08 22:56
New changeset d579866d6419 by R David Murray in branch '2.7':
#14983: always add a line end after a MIME boundary marker.
http://hg.python.org/cpython/rev/d579866d6419

New changeset c2538789c8cf by R David Murray in branch '3.3':
#14983: always add a line end after a MIME boundary marker.
http://hg.python.org/cpython/rev/c2538789c8cf

New changeset 7486c45eb53f by R David Murray in branch 'default':
Merge: #14983: always add a line end after a MIME boundary marker.
http://hg.python.org/cpython/rev/7486c45eb53f
msg210699 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-08 22:58
The existing tests that need to be changed to reflect the new behavior are sufficient for testing this, I think.

There is small amount of backward compatibility concern here, but since it is an RFC compliance bug and it fixes a problem with verification, I decided that the small risk is worth it.
msg210736 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2014-02-09 09:37
Thank you for committing this!
History
Date User Action Args
2022-04-11 14:57:31adminsetgithub: 59188
2014-02-09 09:37:01mitya57setmessages: + msg210736
2014-02-08 22:58:22r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg210699

stage: needs patch -> resolved
2014-02-08 22:56:37python-devsetnosy: + python-dev
messages: + msg210698
2013-09-21 19:27:37r.david.murraysetmessages: + msg198219
2013-09-21 19:23:09r.david.murraysetmessages: + msg198218
2013-09-21 19:21:09r.david.murraysetmessages: + msg198216
2013-09-21 19:19:03sjtsetmessages: + msg198215
2013-09-21 18:20:06dkgsetnosy: + dkg
messages: + msg198214
2013-09-21 18:03:37sjtsetnosy: + sjt
messages: + msg198213
2012-06-03 13:45:54mitya57setmessages: + msg162210
2012-06-02 18:47:40r.david.murraysetmessages: + msg162163
2012-06-02 17:24:27mitya57setmessages: + msg162151
2012-06-02 16:55:25r.david.murraysetmessages: + msg162149
2012-06-02 16:50:58mitya57setmessages: + msg162148
2012-06-02 14:56:33r.david.murraysettitle: [patch] email.generator should always add newlines after closing boundaries -> email.generator should always add newlines after closing boundaries
2012-06-02 14:55:53r.david.murraysetmessages: + msg162138
2012-06-02 14:48:38r.david.murraysetversions: + Python 2.7, Python 3.3
2012-06-02 14:48:23r.david.murraysetmessages: + msg162136
stage: needs patch
2012-06-02 06:52:21mitya57create