Message 391698 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	drlazor8
Recipients	barry, drlazor8, r.david.murray
Date	2021-04-23.14:36:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1619188610.43.0.088106557205.issue43922@roundup.psfhosted.org>
In-reply-to

Content
Hello, We received multiple bug reports about broken links in rich html emails. Sometime, in some emails, a link like <a href="https://example.com"> would become <a href="https://example..com>, notice the double dot. After multiple researches both in the Python email source code and in the RFC, it turns out that Python correctly implements the standard but that the distant (non-python) smtp server used by some of our customers doesn't. The various email standard state the following: 1) As a single dot (".", chr(0x2e)) in a line ends the SMTP transmission, such single dots must be escaped when they are part of the message. RFC 5321, section 4.5.2 requires to escape all dots when they appear at the beginning of a line, using a dot as escape symbol. That is, when the user message contains: "\r\n.\r\n", it is escaped to "\r\n..\r\n". The other smtp side is responsible to remove the extra dot. 2) When we transport the email body using the quoted-printable encoding, RFC 2045 requires each line to have maximum 78 characters and define a single equal ("=", chr(0x3d)) as soft-warp sequence to fold lines too long. The RFC does only require to split the line outside of a quoted character (cannot split in the middle of "=2E"). Like any other character, it is allowed to split the line before a dot. Take the following example: from email.message import EmailMessage from email.policy import SMTP msg = EmailMessage(policy=SMTP) msg.set_context("Hello there, just need some text to reach that seventy-six character, example.com") # ^ # 78th char print(msg.as_string()) # Content-Type: text/plain; charset="utf-8" # Content-Transfer-Encoding: quoted-printable # MIME-Version: 1.0 # # Hello there, just need some text to reach that seventy-six character, example= # .com When the message is sent over smtp, smtplib escapes the line ".com" to become "..com" as required by the RFC. So no problem in the python implementation, it is the other side that is buggy. But! We have two solutions to "fix" the other side, the problem is that they do not correctly parse lines starting with a dot. A solution would be to ensure no line starts with the dot character. Two solutions : (1) quoted-printable encode dots when they are at the beginning of a line, (2) prevent the line folding code from splitting a line before a dot. (1) is allowed by the RFC, any character can be quoted-printable encoded even those that have a safe ascii representation already. In our "example=\n.com" example above, we can qp the code: "example=\n=2Ecom". The line starts with a "2" instead of a dot and the content is the same. (2) is allowed by the RFC, the RFC only states that a line must be at most 78 chars long, it also states it is allowed to fold a line anywhere but in a quoted-printable sequence. It is safe to split a line earlier than the 78th character. In our "example=\n.com" example above, we could split the line at the 77th character: "exampl=\ne.com". The line starts with a "e" instead of a dot and the content is the same. A pull request is coming shortly.

Hello,

We received multiple bug reports about broken links in rich html emails. Sometime, in some emails, a link like <a href="https://example.com"> would become <a href="https://example..com>, notice the double dot.

After multiple researches both in the Python email source code and in the RFC, it turns out that Python correctly implements the standard but that the distant (non-python) smtp server used by some of our customers doesn't.

The various email standard state the following:

1) As a single dot (".", chr(0x2e)) in a line ends the SMTP transmission, such single dots must be escaped when they are part of the message. RFC 5321, section 4.5.2 requires to escape all dots when they appear at the beginning of a line, using a dot as escape symbol. That is, when the user message contains: "\r\n.\r\n", it is escaped to "\r\n..\r\n". The other smtp side is responsible to remove the extra dot.

2) When we transport the email body using the quoted-printable encoding, RFC 2045 requires each line to have maximum 78 characters and define a single equal ("=", chr(0x3d)) as soft-warp sequence to fold lines too long. The RFC does only require to split the line outside of a quoted character (cannot split in the middle of "=2E"). Like any other character, it is allowed to split the line before a dot.

Take the following example:

from email.message import EmailMessage
from email.policy import SMTP

msg = EmailMessage(policy=SMTP)
msg.set_context("Hello there, just need some text to reach that seventy-six character, example.com")
# ^
# 78th char

print(msg.as_string())
# Content-Type: text/plain; charset="utf-8"
# Content-Transfer-Encoding: quoted-printable
# MIME-Version: 1.0
#
# Hello there, just need some text to reach that seventy-six character, example=
# .com

When the message is sent over smtp, smtplib escapes the line ".com" to become "..com" as required by the RFC. So no problem in the python implementation, it is the other side that is buggy.

But! We have two solutions to "fix" the other side, the problem is that they do not correctly parse lines starting with a dot. A solution would be to ensure no line starts with the dot character. Two solutions : (1) quoted-printable encode dots when they are at the beginning of a line, (2) prevent the line folding code from splitting a line before a dot.

(1) is allowed by the RFC, any character can be quoted-printable encoded even those that have a safe ascii representation already. In our "example=\n.com" example above, we can qp the code: "example=\n=2Ecom". The line starts with a "2" instead of a dot and the content is the same.

(2) is allowed by the RFC, the RFC only states that a line must be at most 78 chars long, it also states it is allowed to fold a line anywhere but in a quoted-printable sequence. It is safe to split a line earlier than the 78th character. In our "example=\n.com" example above, we could split the line at the 77th character: "exampl=\ne.com". The line starts with a "e" instead of a dot and the content is the same.

A pull request is coming shortly.

History
Date	User	Action	Args
2021-04-23 14:36:50	drlazor8	set	recipients: + drlazor8, barry, r.david.murray
2021-04-23 14:36:50	drlazor8	set	messageid: <1619188610.43.0.088106557205.issue43922@roundup.psfhosted.org>
2021-04-23 14:36:50	drlazor8	link	issue43922 messages
2021-04-23 14:36:49	drlazor8	create