Message 304705 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	calimeroteknik
Recipients	barry, calimeroteknik, r.david.murray
Date	2017-10-21.13:49:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1508593794.56.0.213398074469.issue31831@psf.upfronthosting.co.za>
In-reply-to

Content
I confirm that as for the crash, the patch in gh-3488 fixes it. The first code excerpt in my initial report now outputs the following, valid headers: Content-Type: text/plain Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename0=utf-8''I%20thought%20I%20could%20put%20a%20few%20words%20in%20th; filename1=e%20filename%20but%20apparently%20it%20does%20not%20go%20so%20we; filename2=ll.txt MIME-Version: 1.0 However, when Unicode is added and the filename is short, things don't look right, this code: import email.message mail = email.message.EmailMessage() mail.add_attachment(b"test", maintype="text", subtype="plain", filename="é.txt") print(mail) Results in these headers: Content-Type: text/plain Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="é.txt" MIME-Version: 1.0 To begin with, it is easy to deduce that there is no way to know that this 'é' character is UTF-8. And it's two 8-bit values at east one of which is detectably outside of 7-bit US-ASCII. Quoting https://tools.ietf.org/html/rfc2231#page-4: >a lightweight encoding mechanism is needed to accommodate 8-bit information in parameter values. The 8-bit encoding goes straight through instead of undergoing the encoding process, which seems required in my interpretation of RFC2231.

I confirm that as for the crash, the patch in gh-3488 fixes it.
The first code excerpt in my initial report now outputs the following, valid headers:

Content-Type: text/plain
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*0*=utf-8''I%20thought%20I%20could%20put%20a%20few%20words%20in%20th;
 filename*1*=e%20filename%20but%20apparently%20it%20does%20not%20go%20so%20we;
 filename*2*=ll.txt
MIME-Version: 1.0


However, when Unicode is added and the filename is short, things don't look right, this code:

import email.message
mail = email.message.EmailMessage()
mail.add_attachment(b"test", maintype="text", subtype="plain", filename="é.txt")
print(mail)

Results in these headers:

Content-Type: text/plain
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="é.txt"
MIME-Version: 1.0

To begin with, it is easy to deduce that there is no way to know that this 'é' character is UTF-8.
And it's two 8-bit values at east one of which is detectably outside of 7-bit US-ASCII.


Quoting https://tools.ietf.org/html/rfc2231#page-4:
>a lightweight encoding mechanism is needed to accommodate 8-bit information in parameter values.

The 8-bit encoding goes straight through instead of undergoing the encoding process, which seems required in my interpretation of RFC2231.

History
Date	User	Action	Args
2017-10-21 13:49:54	calimeroteknik	set	recipients: + calimeroteknik, barry, r.david.murray
2017-10-21 13:49:54	calimeroteknik	set	messageid: <1508593794.56.0.213398074469.issue31831@psf.upfronthosting.co.za>
2017-10-21 13:49:54	calimeroteknik	link	issue31831 messages
2017-10-21 13:49:54	calimeroteknik	create