Message 92853 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rpatterson
Recipients	rpatterson
Date	2009-09-18.23:53:50
SpamBayes Score	0.00027806382
Marked as misclassified	No
Message-id	<1253318032.63.0.187743524666.issue6942@psf.upfronthosting.co.za>
In-reply-to

Content
Due to repeated use of StringIO as a way to "look ahead" into subparts while checking that multipart boundaries are unique, memory consumption during email.generator.Generator.flatten() can be up to 3 times the original message size. I implemented a subclass of email.generator.Generator that works around this using email.message.Message.walk() to check message headers and string (final) payloads for the boundary without duplicating their contents into a StringIO. It assumes that the boundary only ever might be duplicated in a single part's headers or in a single part's payload when that part's payload is a string. IOW, it assumes that the boundary will not be duplicated by some combination of all the parts' and recursive subparts' headers and string payloads. If this assumption is safe, then this implementation should work. If this assumption is not safe, then perhaps a different boundary format can be used which will make this assumption safe? You can find my implementation at http://gitorious.org/rpatterson- imappipe/rpatterson- imappipe/blobs/master/rpatterson/imappipe/generator.py

Due to repeated use of StringIO as a way to "look ahead" into subparts 
while checking that multipart boundaries are unique, memory consumption 
during email.generator.Generator.flatten() can be up to 3 times the 
original message size.

I implemented a subclass of email.generator.Generator that works around 
this using email.message.Message.walk() to check message headers and 
string (final) payloads for the boundary without duplicating their 
contents into a StringIO.

It assumes that the boundary only ever might be duplicated in a single 
part's headers or in a single part's payload when that part's payload is 
a string.  IOW, it assumes that the boundary will not be duplicated by 
some combination of all the parts' and recursive subparts' headers and 
string payloads.

If this assumption is safe, then this implementation should work.  If 
this assumption is not safe, then perhaps a different boundary format 
can be used which will make this assumption safe?

You can find my implementation at http://gitorious.org/rpatterson-
imappipe/rpatterson-
imappipe/blobs/master/rpatterson/imappipe/generator.py

History
Date	User	Action	Args
2009-09-18 23:53:52	rpatterson	set	recipients: + rpatterson
2009-09-18 23:53:52	rpatterson	set	messageid: <1253318032.63.0.187743524666.issue6942@psf.upfronthosting.co.za>
2009-09-18 23:53:51	rpatterson	link	issue6942 messages
2009-09-18 23:53:51	rpatterson	create