Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poplib module broken by str to unicode conversion #47977

Closed
hdima mannequin opened this issue Aug 29, 2008 · 18 comments
Closed

poplib module broken by str to unicode conversion #47977

hdima mannequin opened this issue Aug 29, 2008 · 18 comments
Assignees
Labels
release-blocker stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@hdima
Copy link
Mannequin

hdima mannequin commented Aug 29, 2008

BPO 3727
Nosy @warsaw, @vstinner, @giampaolo, @tiran, @benjaminp
Files
  • poplib-bytes-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/benjaminp'
    closed_at = <Date 2008-11-05.19:48:53.929>
    created_at = <Date 2008-08-29.13:59:49.757>
    labels = ['library', 'type-crash', 'release-blocker']
    title = 'poplib module broken by str to unicode conversion'
    updated_at = <Date 2008-11-05.19:48:53.928>
    user = 'https://bugs.python.org/hdima'

    bugs.python.org fields:

    activity = <Date 2008-11-05.19:48:53.928>
    actor = 'christian.heimes'
    assignee = 'benjamin.peterson'
    closed = True
    closed_date = <Date 2008-11-05.19:48:53.929>
    closer = 'christian.heimes'
    components = ['Library (Lib)']
    creation = <Date 2008-08-29.13:59:49.757>
    creator = 'hdima'
    dependencies = []
    files = ['11940']
    hgrepos = []
    issue_num = 3727
    keywords = ['patch']
    message_count = 18.0
    messages = ['72136', '74699', '74711', '74715', '74716', '74717', '74718', '74870', '74873', '74878', '74883', '74884', '74885', '75283', '75394', '75480', '75499', '75529']
    nosy_count = 6.0
    nosy_names = ['barry', 'hdima', 'vstinner', 'giampaolo.rodola', 'christian.heimes', 'benjamin.peterson']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue3727'
    versions = ['Python 3.0']

    @hdima
    Copy link
    Mannequin Author

    hdima mannequin commented Aug 29, 2008

    Example:

    >>> from poplib import POP3
    >>> p = POP3("localhost")
    >>> p.user("user")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/py3k/Lib/poplib.py", line 179, in user
        return self._shortcmd('USER %s' % user)
      File "/py3k/Lib/poplib.py", line 151, in _shortcmd
        self._putcmd(line)
      File "/py3k/Lib/poplib.py", line 98, in _putcmd
        self._putline(line)
      File "/py3k/Lib/poplib.py", line 91, in _putline
        self.sock.sendall('%s%s' % (line, CRLF))
    TypeError: sendall() argument 1 must be string or buffer, not str
    >>> p.user(b"user")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/py3k/Lib/poplib.py", line 179, in user
        return self._shortcmd('USER %s' % user)
      File "/py3k/Lib/poplib.py", line 151, in _shortcmd
        self._putcmd(line)
      File "/py3k/Lib/poplib.py", line 98, in _putcmd
        self._putline(line)
      File "/py3k/Lib/poplib.py", line 91, in _putline
        self.sock.sendall('%s%s' % (line, CRLF))
    TypeError: sendall() argument 1 must be string or buffer, not str

    @hdima hdima mannequin added stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Aug 29, 2008
    @benjaminp benjaminp changed the title poplib module broken by str to unicode conversion poplib module broken by str to unicode conversionhttp://bugs.python.org/issue3727 Oct 8, 2008
    @benjaminp benjaminp changed the title poplib module broken by str to unicode conversionhttp://bugs.python.org/issue3727 poplib module broken by str to unicode conversion Oct 8, 2008
    @vstinner
    Copy link
    Member

    Here is a patch proposition:

    • a socket uses bytes
    • makefile() creates an unicode file using 'r' mode
    • default encoding ISO-8859-1 because I guess that most servers use
      this encoding, but you can change the encoding using "encoding"
      constructor optioan argument
    • read unicode and write unicode: convert convert from/to bytes at
      the last moment (just after/before reading/writing the socket)
    • cosmetic: use .startswith() instead of for example b[:2] == '..'

    Test updates:

    • replace "localhost" by HOST
    • write a test for a logging (user + password)

    Missing: no SSL unit test. I tested SSL on my personal POP3 account,
    but only the login.

    @vstinner
    Copy link
    Member

    Ooops, my previous patch was wrong (startswith => not startswith).

    I tested python trunk test for poplib: with minor changes, all tests
    are ok except tests using SSL. I get a "select.error: (9, 'Bad file
    descriptor')" from asyncore. So I tried to synchronize python3 ssl
    with python2 trunk, but it depends on python2 trunk version of the
    socket module and this module is very complex and hard to port to
    python3.

    About EBADF error from select(), it may comes from missing makefile()
    method of the ssl socket wrapper.

    @vstinner
    Copy link
    Member

    New version:

    • fix SSL: self.file contains the SSL socket and not the raw socket!
    • upgrade test_poplib.py from Python trunk

    poplib should be refactored to reuse the new IO library. But I don't
    know how to build a TextIO wrapper for a socket. Using TextIO, it
    would be possible to remove newline (CR/LF) and unicode
    encoding/decoding code.

    @giampaolo
    Copy link
    Contributor

    I haven't tried the patch but I think that "encoding" should be a class
    attribute as it is in ftplib and similar py3k network related modules.

    @vstinner
    Copy link
    Member

    New version:

    • remove duplicate methods of POP3_SSL()
    • use makefile('r', encoding=self.encoding) to get a nice text
      wrapper with universal newline
    • remove newline conversion (done by TextIOWrapper)

    Finally my patch removes more code in poplib.py than it adds :-D I
    like such patch.

    Python3 new I/O library rocks!

    @vstinner
    Copy link
    Member

    @giampaolo.rodola: Right, I also prefer encoding as a class attribute.
    So I wrote a new patch:

    • encoding is now a class attribute
    • continue SSL code factorization: SSL code is now around 10 lines
      instead of 70 lines!

    @benjaminp
    Copy link
    Contributor

    I like this patch.

    @vstinner
    Copy link
    Member

    Oooops, I removed the message74562 from giampaolo.rodola, sorry:
    "As for issue bpo-3911 this is another module for which an actual test
    suite would be very necessary."

    @vstinner
    Copy link
    Member

    After testing real world message, my patch using pure unicode doesn't
    work. The problem comes with message encoding with "8-bit" encoding. If
    the email charset is different than POP3.encoding, the message in not
    decoded correctly.

    @vstinner
    Copy link
    Member

    New patch: resp() returns bytes

    • self.file is now a binary file
    • encode commands using POP3.encoding charset, default is UTF-8
    • use md5.hexdigest()
    • factorize POP3_SSL code: code specific for SSL is just the creation
      of the socket

    The default charset is UTF-8, but most servers only accept pure ASCII
    login/password (eg. gmail.com) or a smaller subset of ASCII (eg. only
    A-Z, a-z, 0-9 and some ponctuation signs :-/). If you user non-ASCII
    login/password and your server doesn't use UTF-8, change POP3.encoding
    or <your pop object>.encoding (encoding is not used in the
    constructor).

    welcome attribute (and getwelcome() results) is a bytes string.

    You have to parse the message headers to get the right charset to
    decode bytes to unicode characters. A multipart message may contains
    two or more charsets and different encoding. But poplib is not
    responsible to decode messages, just to download a message as bytes.

    Arguments (username (login), password, a message identifier) are
    unicode strings. For a message identifier, you can also use an integer
    (nothing new, it was already possible).

    I hope that apop() works. No Python error is raised but no server does
    support his authentication method. I tested 3 servers
    (pop3.haypocalc.com, pop.laposte.net and pop.gmail.com) and none
    supports APOP. I tested POP3 and POP3_SSL (gmail requires SSL).

    @vstinner
    Copy link
    Member

    About apop(): the second argument is the user password, not a "shared
    password" which is the local variable "timestamp" read from welcome
    attribute.

    @vstinner
    Copy link
    Member

    I forgot the new unit tests. New patch:

    • port python trunk unit tests

    @vstinner
    Copy link
    Member

    Can anyone review my last patch (poplib-bytes-2.patch)?

    @benjaminp
    Copy link
    Contributor

    I'm happy with Victor's patch.

    @warsaw
    Copy link
    Member

    warsaw commented Nov 4, 2008

    Benjamin's reviewed this and the only thing that jumps out at me is some
    funky indentation at about line 331. If you fix that, you can land this
    patch.

    @benjaminp benjaminp self-assigned this Nov 4, 2008
    @vstinner
    Copy link
    Member

    vstinner commented Nov 4, 2008

    Le Tuesday 04 November 2008 01:03:42 Barry A. Warsaw, vous avez écrit :

    Benjamin's reviewed this and the only thing that jumps out at me is some
    funky indentation at about line 331

    It's not related to my patch (I did'nt change POP3_SSL comment). But well, as
    you want: a new patch re-indent the "See the methods of the parent (...)"
    line (331).

    @tiran
    Copy link
    Member

    tiran commented Nov 5, 2008

    Patch applied in r67109

    @tiran tiran closed this as completed Nov 5, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants