classification
Title: imaplib: Internaldate2tuple and ParseFlags require (and latter returns) bytes arrays; should allow/return str
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eric.smith, lavajoe, r.david.murray
Priority: normal Keywords: patch

Created on 2011-01-19 15:35 by lavajoe, last changed 2011-01-28 18:42 by lavajoe. This issue is now closed.

Files
File name Uploaded Description Edit
imaplib_string_compat.patch lavajoe, 2011-01-19 15:35
Messages (7)
msg126532 - (view) Author: Joe Peterson (lavajoe) Date: 2011-01-19 15:35
In imaplib, there is currently a mix of bytes array and str use.  Time2Internaldate(), e.g., returns (and accepts) str.  Internaldate2tuple() and ParseFlags() only accept a bytes object, and the latter returns a tuple of bytes objects.  Of course, these were all strings in Python 2.

To regain compatibility with Python 2 behavior and make things more consistent, this patch changes the return type of ParseFlags() to a str tuple, and allow it and Internaldate2tuple() to accept either a bytes object or a str.

The unit test has been expanded to test these.  Note that applying this patch assumes that the the patches from issue 10939 and issue 10941 have been applied.

Also, I am not sure it is proper to accept *both* bytes array and str.  Perhaps only str should be allowed.  Comments?
msg126574 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-01-19 23:00
David,

Can you weigh in on this?

I am -1 on improving Internaldate2tuple(). I think the module should provide a function that would return an aware datetime object instead of a timetuple.

The proposed patch will make error reporting from Internaldate2tuple() inconsistent: invalid bytes result in None return while non-ASCII characters in strings would raise an encoding error.  While I don't like the "if not mo: return None" logic, if compatibility with 2.x is the goal, encoding errors should be caught and converted to return None.
msg126686 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-21 03:08
I think the module should be reviewed and made consistent, as Antoine did for nntplib.  I don't know enough about the IMAP protocol to do such a review, or even weigh in meaningfully on the immediate question, nor am I likely to have time to learn enough before 3.2, even assuming we wanted to make such changes at this point in the release cycle.  Putting changes off to 3.3 makes backward compatibility more important, so we may wind up stuck with warts.

Yes a function that returns a datetime would probably make sense.  That would be one way to clear up the warts: introduce a new function and deprecate the old.

A relatively uniformed opinion: it sounds like Internaldate2tuple should take bytes (since it sounds like Internaldate is most likely to be manipulated as a wire-protocol level object), and that Time2Internaldate should correspondingly return bytes(*).  What ParseFlags should return I couldn't guess without learning how the returned result is used in a typical program.

(*) The asymetry in the names of these two functions is already a wart.
msg126687 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-01-21 03:21
On Thu, Jan 20, 2011 at 10:08 PM, R. David Murray
<report@bugs.python.org> wrote:
..
> (*) The asymetry in the names of these two functions is already a wart.

Not to mention the use of CamelCase.
msg126697 - (view) Author: Joe Peterson (lavajoe) Date: 2011-01-21 06:32
These are all good comments.  And I agree that the naming of the functions is not good (and the CamelCase).

Overall, yes, it should be made consistent and the types returned and passed in should be appropriate.  I did a little experimenting, using more imaplib functions in Python3, and I found the following...

In the case of Internaldate2tuple(), I use it as follows (and I suspect most would do the same):

    typ, data = src_box.fetch(src_msg, '(INTERNALDATE)')
    ...check typ here for error...
    src_internal_date = data[0]
    src_date_tuple = myInternaldate2tuple(src_internal_date)

So data[0] is a bytes array, returned by the fetch() method, and it is indeed therefore compatible with Internaldate2tuple().

In fact, it seems that the data, in general, is returned as bytes arrays in imaplib.  Given that we are dealing with raw email headers and bodies, this is perhaps exactly what should be done.  There is some grey area here.  For example, for headers, RFC 2047 discusses using special 'encoded-words' that are regular ASCII in, e.g., subjects (but I've encountered a few cases of UTF-8 in headers, even though this is a dubious practice).  IMAP flags are ASCII as well.

On the other hand, what you get from fetching pieces of emails are thing that you usually want to deal with as strings (e.g., you will want to do find() and startswith(), etc. on them).  But I guess it depends on what is really proper to use for email data.
msg126747 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-21 15:17
Well, we recently added support for parsing binary data streams to the email module (or added back, if you are looking at it from a python2 perspective), exactly because "in the wild" headers are not always RFC compliant ASCII, and because bodies often are RFC *compliant* 8bit.  So yes, the appropriate type for imaplib to be returning is bytes, and the application has to figure out how to decode it to string using the information contained internally in the email message.  The email package is the obvious way to do this, and if there are helper routines that an imaplib programmer would want in order to make that job easier, those would be very legitimate feature requests (but most of what one needs should already be there; specifically decode_header and friends, although there's an outstanding feature request for a convenience wrapper around that).
msg126750 - (view) Author: Joe Peterson (lavajoe) Date: 2011-01-21 15:48
Yep, I agree, and in light of this, we should probably just close this issue and work toward reviewing/improving imaplib in the ways you are suggesting.

As I migrate my imap stuff more to Python3, I'll see if I run into any problems with using bytes throughout (beyond what is addressed in issue 10939).
History
Date User Action Args
2011-01-28 18:42:10lavajoesetstatus: open -> closed
nosy: belopolsky, eric.smith, r.david.murray, lavajoe
2011-01-21 15:48:59lavajoesetnosy: belopolsky, eric.smith, r.david.murray, lavajoe
messages: + msg126750
2011-01-21 15:17:39r.david.murraysetnosy: + eric.smith
messages: + msg126747
2011-01-21 06:32:21lavajoesetnosy: belopolsky, r.david.murray, lavajoe
messages: + msg126697
2011-01-21 03:21:07belopolskysetnosy: belopolsky, r.david.murray, lavajoe
messages: + msg126687
2011-01-21 03:08:30r.david.murraysetnosy: belopolsky, r.david.murray, lavajoe
messages: + msg126686
2011-01-19 23:01:32belopolskysetnosy: belopolsky, r.david.murray, lavajoe
stage: patch review
2011-01-19 23:00:42belopolskysetversions: + Python 3.3, - Python 3.2
nosy: + r.david.murray, belopolsky

messages: + msg126574

type: behavior -> enhancement
2011-01-19 15:35:30lavajoecreate