classification
Title: smtplib doesn't handle unicode passwords
Type: enhancement Stage: patch review
Components: email, Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Gabriele Tornetta, JustAnother1, Vadim Pushtaev, Windson Yang, barry, david__, giampaolo.rodola, r.david.murray, seblu, taleinat
Priority: normal Keywords: patch

Created on 2017-03-07 20:40 by david__, last changed 2019-12-26 23:46 by seblu.

Pull Requests
URL Status Linked Edit
PR 8938 closed Windson Yang, 2018-08-26 07:43
PR 15064 open Windson Yang, 2019-08-01 04:12
Messages (30)
msg289184 - (view) Author: david (david__) Date: 2017-03-07 20:40
Trying to use unicode passwords on smtplib fails miserably on python3.
My particular issue arises on line 643 of said library:

(code, resp) = self.docmd(encode_base64(password.encode('ascii'), eol=''))

which obviously dies when trying to handle unicode chars.
msg289185 - (view) Author: david (david__) Date: 2017-03-07 20:42
I'm sorry I rushed my comment. Same thing happens on line 604


return encode_base64(s.encode('ascii'), eol='')


changing both from 'ascii' to 'utf-8' works for me.
msg289186 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-03-07 21:12
See msg253287.  Someone should check the RFC.  It is not obvious that just encoding using utf8 is correct; fundamentally passwords are binary data.  But the auth methods don't currently accept binary data.  UTF8 is a reasonable default these days, I think, but if we support more than ascii I think we need to support binary, with utf8 as the default encoding.
msg319222 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-10 14:17
Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported.  utf-8 should probably be used as the default encoding for string passwords, rather than ascii.  See also #33741.
msg319497 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-14 06:36
It would be extremely helpful to have some test cases that actually work for users but fail with smtplib.  So far we have no actual examples, likely due to these being passwords.

> Note: it is definitely the case, regardless of what the RFC says, that binary passwords need to be supported.

I'm not sure what you mean by "binary".  Do you mean 8-bit characters, a.k.a. bytes?

> utf-8 should probably be used as the default encoding for string passwords, rather than ascii.

It is also possible that the appropriate encoding here is "latin1" a.k.a. ISO-8859-1 encoding.  This specifically includes many specialized versions of latin characters, e.g. those with German umlauts as mentioned in the duplicate issue #33741.  And it could even be the very common Windows-1252 encoding: "It is probably the most-used 8-bit character encoding in the world." (Wikipedia)
msg319499 - (view) Author: david (david__) Date: 2018-06-14 07:08
In my case I was doing tests with "contraseƱa" which is (spanish for password) and it failed

On June 14, 2018 8:36:30 AM GMT+02:00, Tal Einat <report@bugs.python.org> wrote:
>
>Tal Einat <taleinat@gmail.com> added the comment:
>
>It would be extremely helpful to have some test cases that actually
>work for users but fail with smtplib.  So far we have no actual
>examples, likely due to these being passwords.
>
>> Note: it is definitely the case, regardless of what the RFC says,
>that binary passwords need to be supported.
>
>I'm not sure what you mean by "binary".  Do you mean 8-bit characters,
>a.k.a. bytes?
>
>> utf-8 should probably be used as the default encoding for string
>passwords, rather than ascii.
>
>It is also possible that the appropriate encoding here is "latin1"
>a.k.a. ISO-8859-1 encoding.  This specifically includes many
>specialized versions of latin characters, e.g. those with German
>umlauts as mentioned in the duplicate issue #33741.  And it could even
>be the very common Windows-1252 encoding: "It is probably the most-used
>8-bit character encoding in the world." (Wikipedia)
>
>----------
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><https://bugs.python.org/issue29750>
>_______________________________________
msg319514 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-14 13:54
While you are correct that latin1 may be common in this situation, I think it may still be better to have utf-8 be the default, since that is the (still emerging? :) standard.  However, you are correct to call for examples: if in the *majority* of the real-world cases it turns out latin1 is what is used, then we could default to that (or not have a default, but instead document our observations).

I don't know how we accumulate enough information to make that decision, though.  Maybe we could look at what other mail programs do?  Thunderbird, etc?  David, which mail program(s) did you use that were able to successfully send that password?

And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes.  The library caller will have to be responsible for picking the correct encoding based on local knowledge.
msg319515 - (view) Author: david (david__) Date: 2018-06-14 13:56
Both thunderbird, sogo (web) and gmail (web).

On June 14, 2018 3:54:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:
>
>R. David Murray <rdmurray@bitdance.com> added the comment:
>
>While you are correct that latin1 may be common in this situation, I
>think it may still be better to have utf-8 be the default, since that
>is the (still emerging? :) standard.  However, you are correct to call
>for examples: if in the *majority* of the real-world cases it turns out
>latin1 is what is used, then we could default to that (or not have a
>default, but instead document our observations).
>
>I don't know how we accumulate enough information to make that
>decision, though.  Maybe we could look at what other mail programs do? 
>Thunderbird, etc?  David, which mail program(s) did you use that were
>able to successfully send that password?
>
>And yes, by binary passwords I mean that the module needs to support
>being passed a bytes-like object as the password, since clearly there
>are servers "in the wild" that support non-ascii passwords and the only
>way to be sure one can send the server the correct password is by
>treating it as a series of bytes.  The library caller will have to be
>responsible for picking the correct encoding based on local knowledge.
>
>----------
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><https://bugs.python.org/issue29750>
>_______________________________________
msg319521 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-14 15:14
For the web cases I presume you also set the password using the web interface, so that doesn't really tell us anything useful.  Did you use thunderbird to access the mailbox that you set up via gmail and/or sogo?  That would make what thunderbird does the interesting question.
msg319522 - (view) Author: david (david__) Date: 2018-06-14 15:15
Yes, i used thunderbird for both

On June 14, 2018 5:14:31 PM GMT+02:00, "R. David Murray" <report@bugs.python.org> wrote:
>
>R. David Murray <rdmurray@bitdance.com> added the comment:
>
>For the web cases I presume you also set the password using the web
>interface, so that doesn't really tell us anything useful.  Did you use
>thunderbird to access the mailbox that you set up via gmail and/or
>sogo?  That would make what thunderbird does the interesting question.
>
>----------
>
>_______________________________________
>Python tracker <report@bugs.python.org>
><https://bugs.python.org/issue29750>
>_______________________________________
msg319827 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-17 19:03
> And yes, by binary passwords I mean that the module needs to support being passed a bytes-like object as the password, since clearly there are servers "in the wild" that support non-ascii passwords and the only way to be sure one can send the server the correct password is by treating it as a series of bytes.  The library caller will have to be responsible for picking the correct encoding based on local knowledge.

Perhaps we should make smtplib accept only bytes, passing on the responsibility of using an appropriate encoding to its users?  This seems like the most straightforward and transparent choice. It would not be backwards-compatible, though.

Alternatively, we could change smtplib to accept passwords as bytes or strings, but raise an informative exception when given strings with non-ASCII characters.  As now, users could be surprised if they have been passing passwords as string and hadn't tested their use of smtplib with non-ASCII passwords.  We'd just improve the exception and documentation to clarify the situation.
msg319828 - (view) Author: david (david__) Date: 2018-06-17 19:11
I would like to see the second option (allow both, warning on non-ascii)

On 17 June 2018 at 21:03, Tal Einat <report@bugs.python.org> wrote:

>
> Tal Einat <taleinat@gmail.com> added the comment:
>
> > And yes, by binary passwords I mean that the module needs to support
> being passed a bytes-like object as the password, since clearly there are
> servers "in the wild" that support non-ascii passwords and the only way to
> be sure one can send the server the correct password is by treating it as a
> series of bytes.  The library caller will have to be responsible for
> picking the correct encoding based on local knowledge.
>
> Perhaps we should make smtplib accept only bytes, passing on the
> responsibility of using an appropriate encoding to its users?  This seems
> like the most straightforward and transparent choice. It would not be
> backwards-compatible, though.
>
> Alternatively, we could change smtplib to accept passwords as bytes or
> strings, but raise an informative exception when given strings with
> non-ASCII characters.  As now, users could be surprised if they have been
> passing passwords as string and hadn't tested their use of smtplib with
> non-ASCII passwords.  We'd just improve the exception and documentation to
> clarify the situation.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue29750>
> _______________________________________
>
msg319830 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-17 21:46
We must continue to support at least ascii strings, for backward compatibility reasons.  We can certainly improve the error messages, but the goal of this issue is to add support for bytes passwords.  I lean toward continuing to only support ascii strings, and making it the responsibility of the program to do the encoding to bytes when dealing with non-ascii.  However, I'd like to also be able to recommend in the docs what encoding is most likely to work, if someone can find out what encoding Thunderbird uses...however, it occurs to me that it may be using whatever encoding the OS is using (LC_LANG, oem codepage, etc), and that David's experiments worked because the same encoding was used for the same reason when the password was set.  I'm not sure how browsers/webmail works in that regard, honestly.

That's less important than just adding support for bytes passwords, though.
msg319862 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-18 06:13
I found the Thunderbirg bugzilla issues where they appear to have dealt precisely with this issue (for a variety of protocols, including SMTP):

https://bugzilla.mozilla.org/show_bug.cgi?id=312593
msg319863 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-18 06:15
This specifically seems relevant:

> In order for Thunderbird to be standards-compliant-enough to interoperate with standards-compliant servers, it should use UTF-8 for the SASL PLAIN mechanism regardless of the underlying protocol (IMAP, POP and SMTP). That includes the POP3 "AUTH PLAIN" command and the SMTP "AUTH PLAIN" command.
msg319864 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-18 06:45
There's also some discussion there (from 3 years ago) of possibly needing to fall back to ISO-8859-1 to work with MS Exchange, despite the standards saying UTF-8 should be used.  It's unclear to me whether that's actually the case.
msg319866 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-18 06:55
From reading the aforementioned discussion on Thunderbird's issue tracker, ISTM that encoding with UTF-8 is the way to go.
msg319891 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-18 15:35
I didn't think to look at the standards for the auth mechanisms, I only looked at the smtp standards.  So, if the standard says utf-8, then we should use that.  But we should still support bytes passwords so that an application could work around issues like the possible ms-exchange one, if they need to.  Those could be two separate PRs, though.  In fact, they probably should be.  As a standards-compliance issue, we would be within our rules to backport the utf-8 standards-compliance fix.
msg321172 - (view) Author: Gabriele N Tornetta (Gabriele Tornetta) * Date: 2018-07-06 13:28
Are there any PRs already for this issue? I couldn't find any on GitHub. Also, is the plan to branch the fix down to at least 3.6?
msg321174 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 14:07
I have worked on this, almost ready for a PR.
msg322431 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-26 14:24
Never mind, I won't have time for this any time soon, better if someone else can do it.
msg322447 - (view) Author: Vadim Pushtaev (Vadim Pushtaev) * Date: 2018-07-26 20:11
Hello. I would like to work on this, should the issue be assigned on me or this comment is enough?
msg322450 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-26 21:01
A comment here is all that is needed.
msg322457 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-27 03:26
@Vadim Pushtaev I also want to work on it. If you wanna work together. Maybe we can talk on zulipchat. :D
msg322468 - (view) Author: Vadim Pushtaev (Vadim Pushtaev) * Date: 2018-07-27 06:47
That's OK, you can do it.
msg324480 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-09-02 16:22
I added a pitch to support utf-8.
msg348831 - (view) Author: Sebastien Luttringer (seblu) Date: 2019-08-01 01:31
I hit the same issue.
Do you have news about the patch review and its inclusion?
msg348832 - (view) Author: Windson Yang (Windson Yang) * Date: 2019-08-01 02:15
Sorry, I forgot about this PR, I will update the patch depends on review soon :D
msg348837 - (view) Author: Windson Yang (Windson Yang) * Date: 2019-08-01 04:14
I just updated the PR
msg358892 - (view) Author: Sebastien Luttringer (seblu) Date: 2019-12-26 23:46
Utf8 passwords are still broken on python 3.8.

Patch works great on 3.8.
History
Date User Action Args
2019-12-26 23:46:13seblusetmessages: + msg358892
2019-08-01 04:14:29Windson Yangsetmessages: + msg348837
2019-08-01 04:12:00Windson Yangsetpull_requests: + pull_request14813
2019-08-01 02:15:47Windson Yangsetmessages: + msg348832
2019-08-01 01:31:13seblusetnosy: + seblu
messages: + msg348831
2018-09-02 16:22:35Windson Yangsetmessages: + msg324480
2018-08-26 07:43:15Windson Yangsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request8411
2018-07-27 06:47:48Vadim Pushtaevsetmessages: + msg322468
2018-07-27 03:26:37Windson Yangsetnosy: + Windson Yang
messages: + msg322457
2018-07-26 21:01:17taleinatsetmessages: + msg322450
2018-07-26 20:11:44Vadim Pushtaevsetnosy: + Vadim Pushtaev
messages: + msg322447
2018-07-26 14:24:19taleinatsetmessages: + msg322431
2018-07-06 14:07:52taleinatsetmessages: + msg321174
2018-07-06 13:28:16Gabriele Tornettasetnosy: + Gabriele Tornetta
messages: + msg321172
2018-06-18 15:35:03r.david.murraysetmessages: + msg319891
2018-06-18 06:55:27taleinatsetmessages: + msg319866
2018-06-18 06:45:56taleinatsetmessages: + msg319864
2018-06-18 06:15:47taleinatsetmessages: + msg319863
2018-06-18 06:13:22taleinatsetmessages: + msg319862
2018-06-17 21:46:20r.david.murraysetmessages: + msg319830
2018-06-17 19:11:31david__setmessages: + msg319828
2018-06-17 19:03:29taleinatsetmessages: + msg319827
2018-06-14 15:15:29david__setmessages: + msg319522
2018-06-14 15:14:31r.david.murraysetmessages: + msg319521
2018-06-14 13:56:23david__setmessages: + msg319515
2018-06-14 13:54:31r.david.murraysetmessages: + msg319514
2018-06-14 07:08:37david__setmessages: + msg319499
2018-06-14 06:36:30taleinatsetmessages: + msg319497
2018-06-10 14:18:11r.david.murraysetstage: needs patch
versions: + Python 3.8, - Python 3.7
2018-06-10 14:17:55r.david.murraysetmessages: + msg319222
2018-06-10 14:15:01r.david.murraysetnosy: + taleinat, giampaolo.rodola, JustAnother1
2018-06-10 14:14:35r.david.murraylinkissue33741 superseder
2017-03-07 21:12:12r.david.murraysetversions: + Python 3.7, - Python 3.4
nosy: + barry, r.david.murray

messages: + msg289186

components: + email
type: enhancement
2017-03-07 20:42:26david__setmessages: + msg289185
2017-03-07 20:40:24david__create