This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: poplib module broken by str to unicode conversion
Type: crash Stage:
Components: Library (Lib) Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: barry, benjamin.peterson, christian.heimes, giampaolo.rodola, hdima, vstinner
Priority: release blocker Keywords: patch

Created on 2008-08-29 13:59 by hdima, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
poplib-bytes-3.patch vstinner, 2008-11-04 18:24
Messages (18)
msg72136 - (view) Author: Dmitry Vasiliev (hdima) Date: 2008-08-29 13:59
Example:

>>> from poplib import POP3
>>> p = POP3("localhost")
>>> p.user("user")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/py3k/Lib/poplib.py", line 179, in user
    return self._shortcmd('USER %s' % user)
  File "/py3k/Lib/poplib.py", line 151, in _shortcmd
    self._putcmd(line)
  File "/py3k/Lib/poplib.py", line 98, in _putcmd
    self._putline(line)
  File "/py3k/Lib/poplib.py", line 91, in _putline
    self.sock.sendall('%s%s' % (line, CRLF))
TypeError: sendall() argument 1 must be string or buffer, not str
>>> p.user(b"user")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/py3k/Lib/poplib.py", line 179, in user
    return self._shortcmd('USER %s' % user)
  File "/py3k/Lib/poplib.py", line 151, in _shortcmd
    self._putcmd(line)
  File "/py3k/Lib/poplib.py", line 98, in _putcmd
    self._putline(line)
  File "/py3k/Lib/poplib.py", line 91, in _putline
    self.sock.sendall('%s%s' % (line, CRLF))
TypeError: sendall() argument 1 must be string or buffer, not str
msg74699 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 21:57
Here is a patch proposition:
 - a socket uses bytes
 - makefile() creates an unicode file using 'r' mode
 - default encoding ISO-8859-1 because I guess that most servers use 
this encoding, but you can change the encoding using "encoding" 
constructor optioan argument
 - read unicode and write unicode: convert convert from/to bytes at 
the last moment (just after/before reading/writing the socket)
 - cosmetic: use .startswith() instead of for example b[:2] == '..'

Test updates:
 - replace "localhost" by HOST
 - write a test for a logging (user + password)

Missing: no SSL unit test. I tested SSL on my personal POP3 account, 
but only the login.
msg74711 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 23:07
Ooops, my previous patch was wrong (startswith => not startswith).

I tested python trunk test for poplib: with minor changes, all tests 
are ok except tests using SSL. I get a "select.error: (9, 'Bad file 
descriptor')" from asyncore. So I tried to synchronize python3 ssl 
with python2 trunk, but it depends on python2 trunk version of the 
socket module and this module is very complex and hard to port to 
python3.

About EBADF error from select(), it may comes from missing makefile() 
method of the ssl socket wrapper.
msg74715 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 00:02
New version:
 - fix SSL: self.file contains the SSL socket and not the raw socket!
 - upgrade test_poplib.py from Python trunk

poplib should be refactored to reuse the new IO library. But I don't 
know how to build a TextIO wrapper for a socket. Using TextIO, it 
would be possible to remove newline (CR/LF) and unicode 
encoding/decoding code.
msg74716 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2008-10-14 00:20
I haven't tried the patch but I think that "encoding" should be a class
attribute as it is in ftplib and similar py3k network related modules.
msg74717 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 00:29
New version:
 - remove duplicate methods of POP3_SSL()
 - use makefile('r', encoding=self.encoding) to get a nice text 
wrapper with universal newline
 - remove newline conversion (done by TextIOWrapper)

Finally my patch removes more code in poplib.py than it adds :-D I 
like such patch.

Python3 new I/O library rocks!
msg74718 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-14 00:54
@giampaolo.rodola: Right, I also prefer encoding as a class attribute. 
So I wrote a new patch:
 - encoding is now a class attribute
 - continue SSL code factorization: SSL code is now around 10 lines 
instead of 70 lines!
msg74870 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-10-16 21:59
I like this patch.
msg74873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-16 22:19
Oooops, I removed the message74562 from giampaolo.rodola, sorry:
"As for issue #3911 this is another module for which an actual test 
suite would be very necessary."
msg74878 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-16 22:58
After testing real world message, my patch using pure unicode doesn't 
work. The problem comes with message encoding with "8-bit" encoding. If 
the email charset is different than POP3.encoding, the message in not 
decoded correctly.
msg74883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-17 00:05
New patch: resp() returns bytes
 - self.file is now a binary file
 - encode commands using POP3.encoding charset, default is UTF-8
 - use md5.hexdigest()
 - factorize POP3_SSL code: code specific for SSL is just the creation 
of the socket

The default charset is UTF-8, but most servers only accept pure ASCII 
login/password (eg. gmail.com) or a smaller subset of ASCII (eg. only 
A-Z, a-z, 0-9 and some ponctuation signs :-/). If you user non-ASCII 
login/password and your server doesn't use UTF-8, change POP3.encoding 
or <your pop object>.encoding (encoding is not used in the 
constructor).

welcome attribute (and getwelcome() results) is a bytes string.

You have to parse the message headers to get the right charset to 
decode bytes to unicode characters. A multipart message may contains 
two or more charsets and different encoding. But poplib is not 
responsible to decode messages, just to download a message as bytes.

Arguments (username (login), password, a message identifier) are 
unicode strings. For a message identifier, you can also use an integer 
(nothing new, it was already possible).

I hope that apop() works. No Python error is raised but no server does 
support his authentication method. I tested 3 servers 
(pop3.haypocalc.com, pop.laposte.net and pop.gmail.com) and none 
supports APOP. I tested POP3 and POP3_SSL (gmail requires SSL).
msg74884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-17 00:07
About apop(): the second argument is the user password, not a "shared 
password" which is the local variable "timestamp" read from welcome 
attribute.
msg74885 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-17 00:21
I forgot the new unit tests. New patch:
 - port python trunk unit tests
msg75283 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-28 15:02
Can anyone review my last patch (poplib-bytes-2.patch)?
msg75394 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-10-30 22:52
I'm happy with Victor's patch.
msg75480 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-04 00:03
Benjamin's reviewed this and the only thing that jumps out at me is some
funky indentation at about line 331.  If you fix that, you can land this
patch.
msg75499 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-04 18:24
Le Tuesday 04 November 2008 01:03:42 Barry A. Warsaw, vous avez écrit :
> Benjamin's reviewed this and the only thing that jumps out at me is some
> funky indentation at about line 331

It's not related to my patch (I did'nt change POP3_SSL comment). But well, as 
you want: a new patch re-indent the "See the methods of the parent (...)" 
line (331).
msg75529 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-05 19:48
Patch applied in r67109
History
Date User Action Args
2022-04-11 14:56:38adminsetgithub: 47977
2008-11-05 19:48:53christian.heimessetstatus: open -> closed
nosy: + christian.heimes
resolution: accepted -> fixed
messages: + msg75529
2008-11-04 18:24:49vstinnersetfiles: + poplib-bytes-3.patch
messages: + msg75499
2008-11-04 18:23:36vstinnersetfiles: - poplib-bytes-2.patch
2008-11-04 02:48:37benjamin.petersonsetassignee: benjamin.peterson
2008-11-04 00:03:40barrysetkeywords: - needs review
resolution: accepted
messages: + msg75480
nosy: + barry
2008-10-30 22:52:10benjamin.petersonsetmessages: + msg75394
2008-10-28 15:02:24vstinnersetkeywords: + needs review
messages: + msg75283
2008-10-17 00:21:30vstinnersetfiles: + poplib-bytes-2.patch
messages: + msg74885
2008-10-17 00:20:42vstinnersetfiles: - poplib-bytes.patch
2008-10-17 00:07:26vstinnersetmessages: + msg74884
2008-10-17 00:05:34vstinnersetfiles: + poplib-bytes.patch
messages: + msg74883
2008-10-16 22:58:43vstinnersetmessages: + msg74878
2008-10-16 22:57:29vstinnersetfiles: - poplib_unicode-5.patch
2008-10-16 22:19:49vstinnersetmessages: + msg74873
2008-10-16 22:18:15vstinnersetmessages: - msg74562
2008-10-16 21:59:20benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg74870
2008-10-14 00:54:24vstinnersetfiles: - poplib_unicode-4.patch
2008-10-14 00:54:19vstinnersetfiles: + poplib_unicode-5.patch
messages: + msg74718
2008-10-14 00:30:12vstinnersetfiles: - poplib_unicode-3.patch
2008-10-14 00:30:09vstinnersetfiles: - poplib_unicode-2.patch
2008-10-14 00:29:59vstinnersetfiles: + poplib_unicode-4.patch
messages: + msg74717
2008-10-14 00:20:26giampaolo.rodolasetmessages: + msg74716
2008-10-14 00:02:24vstinnersetfiles: + poplib_unicode-3.patch
messages: + msg74715
2008-10-13 23:07:28vstinnersetfiles: + poplib_unicode-2.patch
messages: + msg74711
2008-10-13 23:01:59vstinnersetfiles: - poplib_unicode.patch
2008-10-13 21:57:45vstinnersetfiles: + poplib_unicode.patch
keywords: + patch
messages: + msg74699
nosy: + vstinner
2008-10-09 09:59:25giampaolo.rodolasetnosy: + giampaolo.rodola
messages: + msg74562
2008-10-08 20:42:48benjamin.petersonsettitle: poplib module broken by str to unicode conversionhttp://bugs.python.org/issue3727 -> poplib module broken by str to unicode conversion
2008-10-08 20:42:34benjamin.petersonsetpriority: release blocker
title: poplib module broken by str to unicode conversion -> poplib module broken by str to unicode conversionhttp://bugs.python.org/issue3727
2008-08-29 13:59:49hdimacreate