Author calimeroteknik
Recipients barry, calimeroteknik, r.david.murray
Date 2017-10-21.13:49:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1508593794.56.0.213398074469.issue31831@psf.upfronthosting.co.za>
In-reply-to
Content
I confirm that as for the crash, the patch in gh-3488 fixes it.
The first code excerpt in my initial report now outputs the following, valid headers:

Content-Type: text/plain
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*0*=utf-8''I%20thought%20I%20could%20put%20a%20few%20words%20in%20th;
 filename*1*=e%20filename%20but%20apparently%20it%20does%20not%20go%20so%20we;
 filename*2*=ll.txt
MIME-Version: 1.0


However, when Unicode is added and the filename is short, things don't look right, this code:

import email.message
mail = email.message.EmailMessage()
mail.add_attachment(b"test", maintype="text", subtype="plain", filename="é.txt")
print(mail)

Results in these headers:

Content-Type: text/plain
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="é.txt"
MIME-Version: 1.0

To begin with, it is easy to deduce that there is no way to know that this 'é' character is UTF-8.
And it's two 8-bit values at east one of which is detectably outside of 7-bit US-ASCII.


Quoting https://tools.ietf.org/html/rfc2231#page-4:
>a lightweight encoding mechanism is needed to accommodate 8-bit information in parameter values.

The 8-bit encoding goes straight through instead of undergoing the encoding process, which seems required in my interpretation of RFC2231.
History
Date User Action Args
2017-10-21 13:49:54calimerotekniksetrecipients: + calimeroteknik, barry, r.david.murray
2017-10-21 13:49:54calimerotekniksetmessageid: <1508593794.56.0.213398074469.issue31831@psf.upfronthosting.co.za>
2017-10-21 13:49:54calimerotekniklinkissue31831 messages
2017-10-21 13:49:54calimeroteknikcreate