Message 77807 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dato
Recipients	dato
Date	2008-12-14.16:55:44
SpamBayes Score	1.8965368e-08
Marked as misclassified	No
Message-id	<1229273746.12.0.577914878535.issue4661@psf.upfronthosting.co.za>
In-reply-to

Content
Currently, email.parser/feedparser can only parse messages that come as a string, or from a file opened in text mode. Email messages, however, can contain 8bit characters in any encoding other than the local one (yet still be valid e-mails, of course), so I think a method is needed to have the parser be able to receive bytes. At the moment, and as far as I can see, it's not possible to parse some perfectly valid messages with python 3.0. I don't think it's appropriate to ask that files be opened with the proper encoding, and then for the parser to read them. First, it is not possible to know what encoding that would be without parsing the message. And second, a message could contain parts in different encoding, and many mailboxes with random messages most certainly do. Also, message objects will need a way to return a bytes repreentation, for the reasons explained above, and particularly if one wants to write back the message without modifying it.

Currently, email.parser/feedparser can only parse messages that come 
as a string, or from a file opened in text mode.

Email messages, however, can contain 8bit characters in any encoding 
other than the local one (yet still be valid e-mails, of course), so I 
think a method is needed to have the parser be able to receive bytes. 
At the moment, and as far as I can see, it's not possible to parse 
some perfectly valid messages with python 3.0.

I don't think it's appropriate to ask that files be opened with the 
proper encoding, and then for the parser to read them. First, it is 
not possible to know what encoding that would be without parsing the 
message. And second, a message could contain parts in different 
encoding, and many mailboxes with random messages most certainly do.

Also, message objects will need a way to return a bytes repreentation, 
for the reasons explained above, and particularly if one wants to 
write back the message without modifying it.

History
Date	User	Action	Args
2008-12-14 16:55:46	dato	set	recipients: + dato
2008-12-14 16:55:46	dato	set	messageid: <1229273746.12.0.577914878535.issue4661@psf.upfronthosting.co.za>
2008-12-14 16:55:45	dato	link	issue4661 messages
2008-12-14 16:55:44	dato	create