Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imaplib does not run under Python 3 #45551

Closed
rtmq mannequin opened this issue Sep 27, 2007 · 26 comments
Closed

imaplib does not run under Python 3 #45551

rtmq mannequin opened this issue Sep 27, 2007 · 26 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@rtmq
Copy link
Mannequin

rtmq mannequin commented Sep 27, 2007

BPO 1210
Nosy @loewis, @warsaw, @vstinner, @tiran, @benjaminp
Files
  • imaplib_bytes-4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/benjaminp'
    closed_at = <Date 2008-11-05.19:40:11.059>
    created_at = <Date 2007-09-27.05:49:34.615>
    labels = ['type-bug', 'library', 'release-blocker']
    title = 'imaplib does not run under Python 3'
    updated_at = <Date 2008-11-05.19:40:11.058>
    user = 'https://bugs.python.org/rtmq'

    bugs.python.org fields:

    activity = <Date 2008-11-05.19:40:11.058>
    actor = 'christian.heimes'
    assignee = 'benjamin.peterson'
    closed = True
    closed_date = <Date 2008-11-05.19:40:11.059>
    closer = 'christian.heimes'
    components = ['Library (Lib)']
    creation = <Date 2007-09-27.05:49:34.615>
    creator = 'rtmq'
    dependencies = []
    files = ['11942']
    hgrepos = []
    issue_num = 1210
    keywords = ['patch']
    message_count = 26.0
    messages = ['56154', '56156', '56163', '56193', '57242', '57254', '57430', '59609', '61918', '71894', '71989', '71992', '72459', '72479', '74731', '74752', '74760', '74761', '74767', '74775', '74778', '74779', '75282', '75479', '75501', '75527']
    nosy_count = 9.0
    nosy_names = ['loewis', 'barry', 'nnorwitz', 'janssen', 'vstinner', 'christian.heimes', 'donmez', 'rtmq', 'benjamin.peterson']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1210'
    versions = ['Python 3.0']

    @rtmq
    Copy link
    Mannequin Author

    rtmq mannequin commented Sep 27, 2007

    imaplib does not run under Python 3.

    The following two-line python program, named testimap.py,
    works when run from a Windows XP system shell prompt
    using Python 2.5.1, but fails with Python 3.0. It
    appears that the logic does not follow the distinction
    between characters and bytes in Python 3.

    import imaplib
    mail=imaplib.IMAP4("mail.rtmq.infosathse.com")

    e:\python25\python testimap.py
    e:\python30\python testimap.py 2>f:syserr

    The last line produced the trace:

    Traceback (most recent call last):
      File "testimap.py", line 10, in <module>
        mail=imaplib.IMAP4("mail.rtmq.infosathse.com")
      File "e:\python30\lib\imaplib.py", line 184, in __init__
        self.welcome = self._get_response()
      File "e:\python30\lib\imaplib.py", line 962, in _get_response
        self._append_untagged(typ, dat)
      File "e:\python30\lib\imaplib.py", line 800, in _append_untagged
        if typ in ur:
    TypeError: unhashable type: 'bytes'

    @rtmq rtmq mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Sep 27, 2007
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Sep 27, 2007

    Would you like to work on a patch?

    @draghuram
    Copy link
    Mannequin

    draghuram mannequin commented Sep 27, 2007

    Just to further understand the issue, I added "imaplib.Debug=5" and here
    is the output preceding the exception stack trace(I replaced the real
    IMAP server name)


    20:19.52 imaplib version 2.58
    20:19.52 new IMAP4 connection, tag=LOLD
    20:19.52 < * OK Microsoft Exchange Server 2003 IMAP4rev1 server
    version 6.5.7638.1 (imapserver.com) ready.
    20:19.52 matched r'\* (?P<type>[A-Z-]+)( (?P<data>.*))?' =>
    (b'OK', b' Microsoft Exchange Server 2003 IMAP4rev1 server version
    6.5.7638.1 (imapserver.com) ready.', b'Microsoft Exchange Server 2003
    IMAP4rev1 server version 6.5.7638.1 (imapserver.com) ready.')


    So it appears that the response is of type "bytes" which in turn is due
    to reading the socket in binary mode (self.file =
    self.sock.makefile('rb')).

    I would like to see how the problem can be fixed but any pointers are
    appreciated.

    @draghuram
    Copy link
    Mannequin

    draghuram mannequin commented Sep 28, 2007

    I have gone through the python-3000 discussions about similar problems
    in other stdlib modules (email, imghdr, sndhdr etc) and found PEP-3137
    (Immutable Bytes and Mutable Buffer). Since that work is in progress, I
    don't think it is worthwhile to fix this problem at this point.

    @tiran
    Copy link
    Member

    tiran commented Nov 8, 2007

    The transition is done. Can you work on a patch and maybe add some
    tests, too? It helps when you start Python with the -bb flag:

    $ ./python -bb -c 'import imaplib; imaplib.Debug=5;
    imaplib.IMAP4("mail.rtmq.infosathse.com")'
      52:01.86 imaplib version 2.58
      52:01.86 new IMAP4 connection, tag=PNFO
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 184, in __init__
        self.welcome = self._get_response()
      File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 907, in
    _get_response
        resp = self._get_line()
      File "/home/heimes/dev/python/py3k/Lib/imaplib.py", line 1009, in
    _get_line
        self._mesg('< %s' % line)
      File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 62, in warn
        globals)
      File "/home/heimes/dev/python/py3k/Lib/warnings.py", line 102, in
    warn_explicit
        raise message
    BytesWarning: str() on a bytes instance

    @draghuram
    Copy link
    Mannequin

    draghuram mannequin commented Nov 8, 2007

    I will see what I can do but it may take a while.

    @draghuram
    Copy link
    Mannequin

    draghuram mannequin commented Nov 12, 2007

    Index: Lib/imaplib.py
    ===================================================================

    --- Lib/imaplib.py      (revision 58956)
    +++ Lib/imaplib.py      (working copy)
    @@ -228,7 +228,7 @@
             self.port = port
             self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
             self.sock.connect((host, port))
    -        self.file = self.sock.makefile('rb')
    +        self.file = self.sock.makefile('r', encoding='ASCII', newline='')
     
     
         def read(self, size):

    This patch fixes the issue but I am not entirely sure that it is
    correct. I quickly looked at IMAP RFC and there does seem to be spec for
    CHARSET in which case, that will have to be used instead of ASCII. It
    requires more research and imap knowledge which I can't claim.

    As for the tests, we need a imap server to connect to. Perhaps, google
    wouldn't mind being used for this purpose?

    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Jan 9, 2008

    You're correct in pointing out that IMAP4 supports arbitrary encodings,
    so simply hard-coding ASCII is not correct. The encoding isn't
    connection-level, but applies to particular sequences of bytes in the
    connection stream. To correctly interpret the bytes as characters,
    decoding must be integrated with the rest of the protocol implementation.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Jan 31, 2008

    IMAP doesn't really support multiple charsets (just looked at RFC 3501).
    There are two places where character sets other than ASCII is used.
    One is in the SEARCH command; there's an optional parameter which can
    indicate that the search strings are in a non-ASCII character set. The
    other is in transmission of message literals (email messages) back and
    forth.

    So probably setting the default encoding at this level isn't quite
    right, as you should definitely be reading raw bytes from the socket,
    not characters, but it isn't too far off. Looks like _command() needs a
    bit of work (it shouldn't try to quote bytes, only strings), and the
    documentation need to be improved, to say that non-ASCII search strings
    and message bodies should be passed as bytes encoded according to the
    specified CHARSET, but with those fixes it should work. Assuming that
    bytes are hashable in Python 3K.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Aug 24, 2008

    Is this still a problem?

    @nnorwitz nnorwitz mannequin added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Aug 24, 2008
    @donmez
    Copy link
    Mannequin

    donmez mannequin commented Aug 26, 2008

    Still fails with beta2:

    >>> import imaplib
    >>> mail=imaplib.IMAP4("mail.rtmq.infosathse.com")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.0/imaplib.py", line 185, in __init__
        self.welcome = self._get_response()
      File "/usr/local/lib/python3.0/imaplib.py", line 912, in _get_response
        if self._match(self.tagre, resp):
      File "/usr/local/lib/python3.0/imaplib.py", line 1021, in _match
        self.mo = cre.match(s)
    TypeError: can't use a string pattern on a bytes-like object

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Aug 26, 2008

    This may not be a real release blocker, but I want to raise the
    priority. It is a regression and we should try to fix it, especially if
    it's easy.

    @nnorwitz nnorwitz mannequin added the release-blocker label Aug 26, 2008
    @warsaw
    Copy link
    Member

    warsaw commented Sep 4, 2008

    This should be fixed but it's not a release blocker.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Sep 4, 2008

    Take a look at the thread here:

    http://mailman2.u.washington.edu/mailman/htdig/imap-protocol/2008-February/000811.html

    I think the summary is, arbitrary bytes may occur in some places, but
    they're likely to be UTF-8. Otherwise, it's mainly ASCII, but purposely
    left vague to see what convention developed.

    @vstinner
    Copy link
    Member

    Here is a patch for imaplib:

    • add encoding attribute to IMAP4 class (as ftplib and see also issue
      3727 for my poplib patch)
    • use makefile('r', encoding=self.encoding) instead of a binary file
      (mode='rb')
    • remove duplicate code in IMAP4_SSL

    I choosed ISO-8859-1 as the default charset. I tested the library on
    my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the
    library needs more unit tests as done for poplib.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Oct 14, 2008

    Victor, what kind of content have you tried this with? For instance, have
    you passed unencoded (Content-Transfer-Encoding: binary) binary data through
    it, by mailing a JPEG, for instance? These things are strings really only
    at the application level; the data is still bytes. In addition, the use of
    Latin-1 goes against the explicit directives of the IMAP group, doesn't it?
    They're pushing UTF-8.

    Bill

    On Tue, Oct 14, 2008 at 4:27 AM, STINNER Victor <report@bugs.python.org>wrote:

    STINNER Victor <victor.stinner@haypocalc.com> added the comment:

    Here is a patch for imaplib:

    • add encoding attribute to IMAP4 class (as ftplib and see also issue
      3727 for my poplib patch)
    • use makefile('r', encoding=self.encoding) instead of a binary file
      (mode='rb')
    • remove duplicate code in IMAP4_SSL

    I choosed ISO-8859-1 as the default charset. I tested the library on
    my local IMAP4 server using IMAP4 and IMAP4_SSL classes. But the
    library needs more unit tests as done for poplib.

    ----------
    keywords: +patch
    nosy: +haypo
    Added file: http://bugs.python.org/file11786/imaplib_unicode.patch


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1210\>


    @vstinner
    Copy link
    Member

    IMAP_stream() is also broken because it uses os.popen2() which has
    been deprecated since long time and now replaced by subprocess.

    Here is a patch replacing os.popen2() by subprocess, but also using
    transparent conversion from/to unicode using io.TextIOWrapper().

    @vstinner
    Copy link
    Member

    what kind of content have you tried this with?

    I only tried the most basic commands like capability(). I retried with
    search() and... hey, search() has a charset argument!? It should reuse
    self.encoding. Same for sort().

    Then I tried to get the content of an email but fetch(num, '(RFC822)')
    fails with "imaplib.abort: command: FETCH => unexpected
    response: 'Return-Path: <example@example.com'". RFC822 is not
    supported by imaplib? The test also fails with Python 2.5.

    @janssen
    Copy link
    Mannequin

    janssen mannequin commented Oct 14, 2008

    Maybe the first thing to do is to expand the Lib/test/test_imaplib.py
    file, which right now is pretty darn minimal. We really need an IMAP
    server somewhere to test against, with a standard library of varied
    messages.

    Perhaps Python.org is running an IMAP server?

    @vstinner
    Copy link
    Member

    The server can send raw 8 bits email in any charset (charset is
    specified in the email headers). That's why I think that it's better
    to keep bytes instead of the unicode conversion using a fixed charset.
    Each email can use a different charset.

    Types used in my new patch:

    • unicode:
      • IMAP commands (charset=ASCII)
      • untagged_responses keys (charset=ASCII)
    • bytes:
      • answer
      • regex
      • tagre attribute
      • untagged_responses values

    I chooosed to keep unicode for some variables to minimize the changes
    in imaplib library and to keep readable code.

    Patch TODO:

    • Remove the assert (added for quicker debugging)
    • Test more functions
    • Restore _checkquote() in _command() method or use
      _quote()/_checkquote() in method which need it. login() already quote
      the password (but why not the login?)

    I also wrote a patch for a "pure bytes string" version, but the patch
    is complex, long and the resulting module source code is hard to read.

    @vstinner
    Copy link
    Member

    New version of my bytes patch:

    • fix IMAP4_stream: use subprocess.Popen() as my previous
      imap_stream.patch but use bytes instead of characters
    • fix IMAP4_SSL: sslobj wasn't set in IMAP4_SSL.open() but used, for
      example, in read() method; remove duplicate method (simplify the code)
    • IMAP4.read(): call file.read() multiple times if the result is
      smaller than size (needed especially for the SSL version); FIXME: does
      this function raise an error of EOF or just loop forever? should we
      stop the loop if data is b''?

    @vstinner
    Copy link
    Member

    Oops, my previous patch didn't include changes to the documentation.
    New patch changes:

    • fix the documentation: os.popen2() => subprocess.Popen(); no more
      ssl() method: use socket()
    • use a buffer of 4096 bytes in read() method (as suggested in socket
      documentation)
    • break read() loop if read() returns an empty bytes string

    @vstinner
    Copy link
    Member

    Can anyone review my last patch (imaplib_bytes-3.patch)?

    @warsaw
    Copy link
    Member

    warsaw commented Nov 3, 2008

    The assertion on line 813 is indented incorrectly. Please fix that.

    I'm concerned we really need better test coverage for this code, but
    it's doubtful we'll get that before 3.0 final is released. I think this
    is the best we're going to do, and nothing else about the code jumps out
    at me.

    Go ahead and land it after that minor fix.

    @benjaminp benjaminp self-assigned this Nov 4, 2008
    @vstinner
    Copy link
    Member

    vstinner commented Nov 4, 2008

    Le Tuesday 04 November 2008 00:59:02 Barry A. Warsaw, vous avez écrit :

    The assertion on line 813 is indented incorrectly. Please fix that.

    Ooops. I'm using the following command because my editor is configured to
    remove the trailing spaces:
    svn diff --diff-cmd="/usr/bin/diff" -x "-ub"

    The line 813 was an assertion. I added many assertions to check types (for
    easier debug) but there are not needed anymore (my code is bugfreee, haha, no
    it's a joke). The new attached patch has no more assertion.

    @tiran
    Copy link
    Member

    tiran commented Nov 5, 2008

    Committed in r67107

    @tiran tiran closed this as completed Nov 5, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants