New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmailMessage.add_attachment(filename="long or spécial") crashes or produces invalid output #76012
Comments
The following code excerpt demonstrates a crash: import email.message
mail = email.message.EmailMessage()
mail.add_attachment(
b"test",
maintype = "text",
subtype = "plain",
filename = "I thought I could put a few words in the filename but apparently it does not go so well.txt"
)
print(mail) Output on python 3.7.0a1: https://gist.github.com/altendky/33c235e8a693235acd0551affee0a4f6 Additionally, a behavioral issue is demonstrated by replacing in the above: Which results in the following output (headers): Content-Type: text/plain Instead of, for example, this correct output (by Mozilla Thunderbird here): Content-Type: text/plain; charset=UTF-8; Issues to note here: The relevant standard is exemplified in section 4.1 of https://tools.ietf.org/html/rfc2231#page-5 Python 3.4.6 and 3.5.4 simply do not wrap anything, which works with but is not conformant to standards. Solving all of the above would imply correctly splitting any header. Unfortunately I do not understand what's going on there very well. As yet an additional misbehaviour to note, try to repeat the above print statement twice. Content-Type: text/plain It would appear that "filename" has disappeared. PS: The above output also illustrates this (way more minor) issue: https://bugs.python.org/issue25235 |
Erratum: the output generated by python 3.5 and 3.4 causes line wraps in the SMTP delivery chain, which cause exactly the same breakage as ulterior versions: the crucially needed indendation of one space ends up being absent. |
Does the patch in #47738 fix this? I think it should, or if it doesn't that's a bug in the PR patch. |
I confirm that as for the crash, the patch in #47738 fixes it. Content-Type: text/plain However, when Unicode is added and the filename is short, things don't look right, this code: import email.message
mail = email.message.EmailMessage()
mail.add_attachment(b"test", maintype="text", subtype="plain", filename="é.txt")
print(mail) Results in these headers: Content-Type: text/plain To begin with, it is easy to deduce that there is no way to know that this 'é' character is UTF-8. Quoting https://tools.ietf.org/html/rfc2231#page-4:
The 8-bit encoding goes straight through instead of undergoing the encoding process, which seems required in my interpretation of RFC2231. |
You are correct, that is a bug. Presumably I forgot to check for non-ascii when the parameter value doesn't need to be folded. I'm not sure when I'll have time to look at this, unfortunately :(. If you can see how to fix it, you could submit a PR against my PR branch, I think. |
Eventually there is no bug, I was just confused at the output of print() on the EmailMessage. I noticed that in email/_header_value_parser.py policy.utf8 was True. def __str__(self):
return self.as_string(policy=self.policy.clone(utf8=True) print() will use __str__() and this is why it happens. I didn't dig out the exact reason since there are so many delegated calls. Sorry for the false alert. After additional fuzzing, checking the output with EmailMessage.as_string(), everything seems OK. That's a +1 for #47738, which fixes this bug. |
Great, thank you for that research. And yes, that's exactly why __str__ uses utf8=True, the "picture" of the message is much more readable. I will commit that PR soon. |
The PR has been committed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: