classification
Title: email documentation needs to be precise about strings/bytes
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, beazley, georg.brandl, r.david.murray
Priority: normal Keywords:

Created on 2008-12-29 15:21 by beazley, last changed 2010-12-12 18:08 by r.david.murray. This issue is now closed.

Messages (3)
msg78456 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 15:21
Documentation for the email package needs to be more clear about the 
usage of strings and bytes.  In particular:

1.  All operations that parse email messages such as message_from_file()
    or message_from_string() operate on *text*, not binary data.  So,
    the file must be opened in text mode.  Strings must be text strings,
    not binary strings.

2.  All operations that set/get the payload of a message operate on
    byte strings.   For example, using m.get_payload() on a Message
    object returns binary data as a byte string.

Opinion:  There might be other bug reports about this, but I'm not 
advocating that the email module should support reading messages from 
binary mode files or byte strings.  Email and MIME were originally 
developed with the assumption that messages would always be handled as 
text. Minimally, this assumed that messages would stay intact even if 
processed as 7-bit ASCII.   By extension, everything should still work 
if processed as Unicode.  So, I think the use of text-mode files is 
entirely consistent with this if you wanted to keep the module "as is." 

There may be some confusion on this matter because if you're reading or 
writing email messages (or sending them across a socket), you may 
encounter messages stored in the form of bytes strings instead of text.   
People will then wonder why a byte string can't be parsed by this module 
(especially given that email messages only use character values in the 
range of 0-127).
msg97362 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-01-07 19:27
We've concluded that the email package does need to be able to read bytes.  To do this and still support reading text, we're going to have to change the API.  So the docs will get fixed when they get rewritten for the new API.

Patches to the 3.1 docs to clarify the current situation would probably be accepted and applied, but I suspect the email team is not going to provide them due to lack of time :(.
msg123844 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-12 18:08
The wording was clarified for 3.2 as part of the fix for issue 4661.  This does not help the 3.1 docs, so if someone wants to suggest a patch for the 3.1 docs we can reopen the issue.
History
Date User Action Args
2010-12-12 18:08:18r.david.murraysetstatus: open -> closed
resolution: postponed -> fixed
messages: + msg123844

stage: resolved
2010-05-05 13:39:06barrysetassignee: barry -> r.david.murray
2010-01-07 19:27:47r.david.murraysetpriority: normal
versions: + Python 3.2, - Python 3.0
nosy: + r.david.murray

messages: + msg97362

resolution: postponed
2008-12-29 15:34:34georg.brandlsetassignee: georg.brandl -> barry
nosy: + barry
2008-12-29 15:21:48beazleycreate