classification
Title: imaplib does not run under Python 3
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: barry, benjamin.peterson, christian.heimes, donmez, janssen, loewis, nnorwitz, rtmq, vstinner
Priority: release blocker Keywords: patch

Created on 2007-09-27 05:49 by rtmq, last changed 2008-11-05 19:40 by christian.heimes. This issue is now closed.

Files
File name Uploaded Description Edit
imaplib_bytes-4.patch vstinner, 2008-11-04 18:34
Messages (26)
msg56154 - (view) Author: Robert T McQuaid (rtmq) Date: 2007-09-27 05:49
imaplib does not run under Python 3.

The following two-line python program, named testimap.py,
works when run from a Windows XP system shell prompt
using Python 2.5.1, but fails with Python 3.0.  It
appears that the logic does not follow the distinction
between characters and bytes in Python 3.


import imaplib
mail=imaplib.IMAP4("mail.rtmq.infosathse.com")


e:\python25\python   testimap.py
e:\python30\python   testimap.py 2>f:syserr


The last line produced the trace:


Traceback (most recent call last):
  File "testimap.py", line 10, in <module>
    mail=imaplib.IMAP4("mail.rtmq.infosathse.com")
  File "e:\python30\lib\imaplib.py", line 184, in __init__
    self.welcome = self._get_response()
  File "e:\python30\lib\imaplib.py", line 962, in _get_response
    self._append_untagged(typ, dat)
  File "e:\python30\lib\imaplib.py", line 800, in _append_untagged
    if typ in ur:
TypeError: unhashable type: 'bytes'
msg56156 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-09-27 06:10
Would you like to work on a patch?
msg56163 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-09-27 14:39
Just to further understand the issue, I added "imaplib.Debug=5" and here
is the output preceding the exception stack trace(I replaced the real
IMAP server name)

***************
  20:19.52 imaplib version 2.58
  20:19.52 new IMAP4 connection, tag=LOLD
  20:19.52 < * OK Microsoft Exchange Server 2003 IMAP4rev1 server
version 6.5.7638.1 (imapserver.com) ready.
  20:19.52      matched r'\* (?P<type>[A-Z-]+)( (?P<data>.*))?' =>
(b'OK', b' Microsoft Exchange Server 2003 IMAP4rev1 server version
6.5.7638.1 (imapserver.com) ready.', b'Microsoft Exchange Server 2003
IMAP4rev1 server version 6.5.7638.1 (imapserver.com) ready.')
***************

So it appears that the response is of type "bytes" which in turn is due
to reading the socket in binary mode (self.file =
self.sock.makefile('rb')). 

I would like to see how the problem can be fixed but any pointers are
appreciated.
msg56193 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-09-28 18:41
I have gone through the python-3000 discussions about similar problems
in other stdlib modules (email, imghdr, sndhdr etc) and found PEP 3137
(Immutable Bytes and Mutable Buffer). Since that work is in progress, I
don't think it is worthwhile to fix this problem at this point.
msg57242 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-08 13:53
The transition is done. Can you work on a patch and maybe add some
tests, too? It helps when you start Python with the -bb flag:

$ ./python -bb -c 'import imaplib; imaplib.Debug=5;
imaplib.IMAP4("mail.rtmq.infosathse.com")'
  52:01.86 imaplib version 2.58
  52:01.86 new IMAP4 connection, tag=PNFO
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 184, in __init__
    self.welcome = self._get_response()
  File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 907, in
_get_response
    resp = self._get_line()
  File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 1009, in
_get_line
    self._mesg('< %s' % line)
  File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 62, in warn
    globals)
  File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 102, in
warn_explicit
    raise message
BytesWarning: str() on a bytes instance
msg57254 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-11-08 14:59
I will see what I can do but it may take a while.
msg57430 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2007-11-12 21:42
Index: Lib/imaplib.py
===================================================================
--- Lib/imaplib.py      (revision 58956)
+++ Lib/imaplib.py      (working copy)
@@ -228,7 +228,7 @@
         self.port = port
         self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
         self.sock.connect((host, port))
-        self.file = self.sock.makefile('rb')
+        self.file = self.sock.makefile('r', encoding='ASCII', newline='')
 
 
     def read(self, size):

-------------

This patch fixes the issue but I am not entirely sure that it is
correct. I quickly looked at IMAP RFC and there does seem to be spec for
CHARSET in which case, that will have to be used instead of ASCII. It
requires more research and imap knowledge which I can't claim.

As for the tests, we need a imap server to connect to. Perhaps, google
wouldn't mind being used for this purpose?
msg59609 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2008-01-09 16:17
You're correct in pointing out that IMAP4 supports arbitrary encodings,
so simply hard-coding ASCII is not correct.  The encoding isn't
connection-level, but applies to particular sequences of bytes in the
connection stream.  To correctly interpret the bytes as characters,
decoding must be integrated with the rest of the protocol implementation.
msg61918 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-01-31 18:03
IMAP doesn't really support multiple charsets (just looked at RFC 3501).
 There are two places where character sets other than ASCII is used. 
One is in the SEARCH command; there's an optional parameter which can
indicate that the search strings are in a non-ASCII character set.  The
other is in transmission of message literals (email messages) back and
forth.

So probably setting the default encoding at this level isn't quite
right, as you should definitely be reading raw bytes from the socket,
not characters, but it isn't too far off.  Looks like _command() needs a
bit of work (it shouldn't try to quote bytes, only strings), and the
documentation need to be improved, to say that non-ASCII search strings
and message bodies should be passed as bytes encoded according to the
specified CHARSET, but with those fixes it should work.  Assuming that
bytes are hashable in Python 3K.
msg71894 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-08-24 22:22
Is this still a problem?
msg71989 - (view) Author: Ismail Donmez (donmez) * Date: 2008-08-26 17:50
Still fails with beta2:

>>> import imaplib
>>> mail=imaplib.IMAP4("mail.rtmq.infosathse.com")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.0/imaplib.py", line 185, in __init__
    self.welcome = self._get_response()
  File "/usr/local/lib/python3.0/imaplib.py", line 912, in _get_response
    if self._match(self.tagre, resp):
  File "/usr/local/lib/python3.0/imaplib.py", line 1021, in _match
    self.mo = cre.match(s)
TypeError: can't use a string pattern on a bytes-like object
msg71992 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-08-26 18:37
This may not be a real release blocker, but I want to raise the
priority.  It is a regression and we should try to fix it, especially if
it's easy.
msg72459 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-09-04 02:12
This should be fixed but it's not a release blocker.
msg72479 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-09-04 04:58
Take a look at the thread here:

http://mailman2.u.washington.edu/mailman/htdig/imap-protocol/2008-February/000811.html

I think the summary is, arbitrary bytes may occur in some places, but
they're likely to be UTF-8.  Otherwise, it's mainly ASCII, but purposely
left vague to see what convention developed.
msg74731 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 11:27
Here is a patch for imaplib:
 - add encoding attribute to IMAP4 class (as ftplib and see also issue 
3727 for my poplib patch)
 - use makefile('r', encoding=self.encoding) instead of a binary file 
(mode='rb')
 - remove duplicate code in IMAP4_SSL

I choosed ISO-8859-1 as the default charset. I tested the library on 
my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the 
library needs more unit tests as done for poplib.
msg74752 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-10-14 15:57
Victor, what kind of content have you tried this with?  For instance, have
you passed unencoded (Content-Transfer-Encoding: binary) binary data through
it, by mailing a JPEG, for instance?  These things are strings really only
at the application level; the data is still bytes.  In addition, the use of
Latin-1 goes against the explicit directives of the IMAP group, doesn't it?
They're pushing UTF-8.

Bill

On Tue, Oct 14, 2008 at 4:27 AM, STINNER Victor <report@bugs.python.org>wrote:

>
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> Here is a patch for imaplib:
>  - add encoding attribute to IMAP4 class (as ftplib and see also issue
> 3727 for my poplib patch)
>  - use makefile('r', encoding=self.encoding) instead of a binary file
> (mode='rb')
>  - remove duplicate code in IMAP4_SSL
>
> I choosed ISO-8859-1 as the default charset. I tested the library on
> my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the
> library needs more unit tests as done for poplib.
>
> ----------
> keywords: +patch
> nosy: +haypo
> Added file: http://bugs.python.org/file11786/imaplib_unicode.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1210>
> _______________________________________
>
msg74760 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 18:14
IMAP_stream() is also broken because it uses os.popen2() which has 
been deprecated since long time and now replaced by subprocess.

Here is a patch replacing os.popen2() by subprocess, but also using 
transparent conversion from/to unicode using io.TextIOWrapper().
msg74761 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 18:21
> what kind of content have you tried this with?

I only tried the most basic commands like capability(). I retried with 
search() and... hey, search() has a charset argument!? It should reuse 
self.encoding. Same for sort().

Then I tried to get the content of an email but fetch(num, '(RFC822)') 
fails with "imaplib.abort: command: FETCH => unexpected 
response: 'Return-Path: <example@example.com'". RFC822 is not 
supported by imaplib? The test also fails with Python 2.5.
msg74767 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-10-14 19:31
Maybe the first thing to do is to expand the Lib/test/test_imaplib.py
file, which right now is pretty darn minimal.  We really need an IMAP
server somewhere to test against, with a standard library of varied
messages.

Perhaps Python.org is running an IMAP server?
msg74775 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 22:14
The server can send raw 8 bits email in any charset (charset is 
specified in the email headers). That's why I think that it's better 
to keep bytes instead of the unicode conversion using a fixed charset. 
Each email can use a different charset.

Types used in my new patch:
 - unicode:
   * IMAP commands (charset=ASCII)
   * untagged_responses keys (charset=ASCII)
 - bytes:
   * answer
   * regex
   * tagre attribute
   * untagged_responses values

I chooosed to keep unicode for some variables to minimize the changes 
in imaplib library and to keep readable code.

Patch TODO:
 - Remove the assert (added for quicker debugging)
 - Test more functions
 - Restore _checkquote() in _command() method or use 
_quote()/_checkquote() in method which need it. login() already quote 
the password (but why not the login?)

I also wrote a patch for a "pure bytes string" version, but the patch 
is complex, long and the resulting module source code is hard to read.
msg74778 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 22:34
New version of my bytes patch:
 - fix IMAP4_stream: use subprocess.Popen() as my previous 
imap_stream.patch but use bytes instead of characters
 - fix IMAP4_SSL: sslobj wasn't set in IMAP4_SSL.open() but used, for 
example, in read() method; remove duplicate method (simplify the code)
 - IMAP4.read(): call file.read() multiple times if the result is 
smaller than size (needed especially for the SSL version); FIXME: does 
this function raise an error of EOF or just loop forever? should we 
stop the loop if data is b''?
msg74779 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 22:43
Oops, my previous patch didn't include changes to the documentation. 
New patch changes:
 - fix the documentation: os.popen2() => subprocess.Popen(); no more 
ssl() method: use socket()
 - use a buffer of 4096 bytes in read() method (as suggested in socket 
documentation)
 - break read() loop if read() returns an empty bytes string
msg75282 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-28 15:02
Can anyone review my last patch (imaplib_bytes-3.patch)?
msg75479 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-03 23:58
The assertion on line 813 is indented incorrectly.  Please fix that.

I'm concerned we really need better test coverage for this code, but
it's doubtful we'll get that before 3.0 final is released.  I think this
is the best we're going to do, and nothing else about the code jumps out
at me.

Go ahead and land it after that minor fix.
msg75501 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-04 18:34
Le Tuesday 04 November 2008 00:59:02 Barry A. Warsaw, vous avez écrit :
> The assertion on line 813 is indented incorrectly. Please fix that.

Ooops. I'm using the following command because my editor is configured to 
remove the trailing spaces:
   svn diff --diff-cmd="/usr/bin/diff" -x "-ub"

The line 813 was an assertion. I added many assertions to check types (for 
easier debug) but there are not needed anymore (my code is bugfreee, haha, no 
it's a joke). The new attached patch has no more assertion.
msg75527 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-05 19:40
Committed in r67107
History
Date User Action Args
2008-11-05 19:40:11christian.heimessetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg75527
2008-11-04 18:34:03vstinnersetfiles: + imaplib_bytes-4.patch
messages: + msg75501
2008-11-04 18:30:11draghuramsetnosy: - draghuram
2008-11-04 18:29:07exarkunsetnosy: - exarkun
2008-11-04 18:27:16vstinnersetfiles: - imaplib_bytes-3.patch
2008-11-04 02:41:35benjamin.petersonsetassignee: benjamin.peterson
nosy: + benjamin.peterson
2008-11-03 23:59:01barrysetkeywords: - needs review
messages: + msg75479
2008-10-28 15:02:02vstinnersetkeywords: + needs review
messages: + msg75282
2008-10-14 22:43:31vstinnersetfiles: + imaplib_bytes-3.patch
messages: + msg74779
2008-10-14 22:41:32vstinnersetfiles: - imaplib_bytes-2.patch
2008-10-14 22:34:57vstinnersetfiles: - imaplib_bytes.patch
2008-10-14 22:34:51vstinnersetfiles: + imaplib_bytes-2.patch
messages: + msg74778
2008-10-14 22:20:33vstinnersetfiles: - imaplib_stream.patch
2008-10-14 22:14:05vstinnersetfiles: + imaplib_bytes.patch
messages: + msg74775
2008-10-14 21:55:01vstinnersetfiles: - imaplib_unicode.patch
2008-10-14 19:31:11janssensetmessages: + msg74767
2008-10-14 18:21:13vstinnersetmessages: + msg74761
2008-10-14 18:14:21vstinnersetfiles: + imaplib_stream.patch
messages: + msg74760
2008-10-14 17:36:08vstinnersetfiles: - unnamed
2008-10-14 15:57:11janssensetfiles: + unnamed
messages: + msg74752
2008-10-14 11:27:46vstinnersetfiles: + imaplib_unicode.patch
nosy: + vstinner
messages: + msg74731
keywords: + patch
2008-10-02 12:54:03barrysetpriority: deferred blocker -> release blocker
2008-09-26 22:18:07barrysetpriority: release blocker -> deferred blocker
2008-09-18 05:42:32barrysetpriority: deferred blocker -> release blocker
2008-09-04 04:58:21janssensetmessages: + msg72479
2008-09-04 02:12:11barrysetpriority: release blocker -> deferred blocker
nosy: + barry
messages: + msg72459
2008-08-26 18:37:30nnorwitzsetpriority: normal -> release blocker
messages: + msg71992
2008-08-26 17:50:39donmezsetnosy: + donmez
messages: + msg71989
2008-08-24 22:22:34nnorwitzsetnosy: + nnorwitz
type: crash -> behavior
messages: + msg71894
2008-01-31 18:03:17janssensetnosy: + janssen
messages: + msg61918
2008-01-09 16:17:32exarkunsetnosy: + exarkun
messages: + msg59609
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: Python 3.0
2007-11-12 21:42:33draghuramsetmessages: + msg57430
2007-11-08 14:59:57draghuramsetmessages: + msg57254
2007-11-08 13:53:28christian.heimessetnosy: + christian.heimes
messages: + msg57242
2007-11-04 13:49:32christian.heimessetpriority: normal
keywords: + py3k
resolution: accepted
2007-09-28 18:41:35draghuramsetmessages: + msg56193
2007-09-27 14:39:47draghuramsetmessages: + msg56163
2007-09-27 14:22:43draghuramsetnosy: + draghuram
2007-09-27 06:10:00loewissetnosy: + loewis
messages: + msg56156
2007-09-27 05:49:34rtmqcreate