classification
Title: urllib incorrectly quotes username and password in https basic auth
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: carljm, ezio.melotti, jcea, joneskoo, maker, orsenthil, python-dev
Priority: normal Keywords: patch

Created on 2011-12-20 15:02 by joneskoo, last changed 2012-01-14 12:10 by joneskoo. This issue is now closed.

Files
File name Uploaded Description Edit
tests-and-fakehttp-request-storing.diff joneskoo, 2012-01-06 02:27 FakeHTTPConnection stores request, added tests
tests-and-fakehttp-request-storing-2.diff joneskoo, 2012-01-06 03:06
issue13642.patch maker, 2012-01-10 10:47 review
issue13642_with_map.patch maker, 2012-01-10 10:48 review
urllib-userpass-w-spaces.patch joneskoo, 2012-01-14 07:44 test_userpass_inurl_w_spaces for python3.2 review
urllib-userpass-w-spaces-fix.patch joneskoo, 2012-01-14 08:09 same as previous but with fix
Messages (18)
msg149915 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2011-12-20 15:02
Reproduction:

>>> import urllib
>>> urllib.urlopen("https://example.com/")
Enter username for Test Site at example.com: user
Enter password for user in Test Site at example.com: top secret
Enter username for Test Site at example.com:
# If the correct password contains spaces, nothing will be accepted.

The problem is that the password in basic auth is URI quoted and then base64 encoded. The password should not be quoted.

RFC 2617:
      userid      = *<TEXT excluding ":">
      password    = *TEXT
      base64-user-pass  = <base64 [4] encoding of user-pass,
                       except not limited to 76 char/line>

I traced the problem with Pydev to urllib retry_https_basic_auth where I can see that
  user = "user"
  passwd = "my secret password"

After that, the path is like this:
self.retry_https_basic_auth:
  self.open(fullurl="https://user:my%20%secret%20password@example.com/")
  self.open_https(url="://user:my%20%secret%20password@example.com/")
  => in open_https:
    host, selector = splithost(url)
    user_passwd, host = splituser(host)
    host = unquote(host)

user_passwd is not unquoted, host is.

I found closely related Issue2244 - but did not confirm where this bug has been introduced. I added some people from 2244 to this issue. I hope that is ok.

I think a test should be added that covers usernames and passwords with spaces to avoid further regressions. The reproduction code given works with Python 2.4.3 urllib. This probably also affects python3, did not try.
msg150245 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-25 03:50
Joonas, this issue seems easy to solve. Do you want to try to post a patch?. Extra credits for patching testsuite too :).

If you work in 2.7, I promise to up-port the patch to 3.x.
msg150378 - (view) Author: Michele Orrù (maker) * Date: 2011-12-30 14:21
> Joonas, this issue seems easy to solve. Do you want to try to post a 
> patch?. Extra credits for patching testsuite too :).
As far as I see, it would be sufficient to add unquote(passed) to _open_generic_http. Regarding unittests instead, there is already a method called test_userpass_inurl which could be extended with some tests on a password containing spaces ( Lib/test/test_urllib.py:263). But what I haven't yet understood is: does it really exists a user:pass in python.org?
msg150710 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-06 02:27
> Regarding unittests instead, there is already a method called
> test_userpass_inurl which could be extended with some tests on a
> password containing spaces ( Lib/test/test_urllib.py:263). But what
> I haven't yet understood is: does it really exists a user:pass in
> python.org?

Note Lib/test/test_urllib.py:261 ; there is a fake HTTP wrapper in the test. So the request is not really sent.

I modified FakeHTTPConnection to store the sent HTTP request. I also copied the test you mentioned from python3 to 2.7. The second test I add in the patch fails. The test should pass with python2.5 from OS X (did not run the test but checked headers against netcat).

Please take a look at the tests I added. I'm not sure if geturl() should return the quoted version or not. But certainly the quoted version must not be used in the base64. If you think geturl() should return the quoted version, I'm fine with that - in principle characters like \n in the password could be bad in an URL unless quoted.

Maybe the tests could ALSO be added to some other places, but I think this full path makes sense to check like this.
msg150714 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-06 03:06
Updated patch for 2.7 hg tip attached. Please review, test and if ok, port to 3.x.

I guess the URL needs to be quoted so commented out the assertion for the URL being equal. I added unquote in the base64 encoding of the password, which makes the test pass. Seems to work for me and no urllib tests were broken. Did not run others.

http://test.webdav.org/ has some basic auth test accounts configured if you want to try it out. You can use wireshark to grab the base64 from the unencrypted http. Fancy opener works for me with this now, too.
msg150720 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-01-06 10:53
Some review comments. Instead of doing the inline unquote like this -

-            auth = base64.b64encode(user_passwd).strip()
+            auth = base64.b64encode(unquote(user_passwd)).strip()

It is better to do the explicitly above the b64 encoding step.
Just as host has been unquoted.

                    user_passwd, host = splituser(host)
                    host = unquote(host)

Also, you have done this only for https_open, the same would need be replicated for http_open and also for proxy_passwd.  Also on tests, Modifying sendall with

             def sendall(self, str):
-                pass
+                FakeHTTPConnection.request += str

seems a bit odd to me, you are using a class level object and adding a str. I think, there should be better way to do. (I shall provide an example). Also str term can replaced, even if it was coming from old code.
msg150734 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-06 14:53
> It is better to do the explicitly above the b64 encoding step.
> Just as host has been unquoted.
> 
>                     user_passwd, host = splituser(host)
>                     host = unquote(host)

Ok. So it needs to be done on the line after import base64.

> Also, you have done this only for https_open, the same would need be
> replicated for http_open and also for proxy_passwd.

So four cases where this may need to be fixed and my test only covers one of them:

* http without proxy
* http with proxy
* https without proxy
* https with proxy

Copypasted code :(

Can the https and the proxy auth be tested with the same fake http connection, when the request is stored?

>              def sendall(self, str):
> -                pass
> +                FakeHTTPConnection.request += str
> 
> seems a bit odd to me

Agreed, not clean and needs to be fixed. Works here, but could cause the test code to become unreadable later on. Wrote it at 5 am, not getting sleep. Please provide a cleaner alternative. :)

One question: how come the fake http is given HTTP headers in some tests and payload only in others? Is it emulating the TCP stream or the payload stream handle? It can't do both, can it? Are the headers in some tests actually doing anything?

I would not mind if someone else finished the patch :)
msg150913 - (view) Author: Michele Orrù (maker) * Date: 2012-01-08 21:43
There's no need to port your patch over python3k, since urllib behaves differently with http passwords - as you can see in the doc http://docs.python.org/dev/py3k/library/urllib.request.html#examples 

I would be glad to finish your password on 2.7 as soon as possible, joneskoo. Thanks.
msg150939 - (view) Author: Michele Orrù (maker) * Date: 2012-01-09 15:05
Patch attached. Note that now unquote is called with host using map(), and b64 encoded strings are no more hardcoded. Please tell me if those changes are acceptable - anyway they don't break any other unit tests.
msg151005 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-10 07:01
Michele, in your patch:

+            authorization = ("Authorization: Basic %s\r\n" %
+                             b64encode('a%20b:c%20d'))

This is wrong. See the original report by me and RFC 2617. The username and password MUST NOT be url encoded before base64. That is the original problem. The point is that this test should fail with urllib before the change and not fail with the fix.

Secondly, I think unquote will fail when given a None. For me, some other unit tests caught this when I had the unquote where the splituser is called. I didn't run your code but are other urllib tests ok for you?

I like your change of having the base64 explicitly there and not as a magic string is a good idea.

Senthil, could you provide the better alternative for the class field abuse, please?
msg151010 - (view) Author: Michele Orrù (maker) * Date: 2012-01-10 10:47
Whoops, probably I tested using $ python instead of $ ./python.exe - 
Attaching two patches, one keeps using map(), but definitely changes unquote() behavior; the other simply asserts user_passwd exists before using unquote().

Well, concerning the class field abuse, HTTPConnection._buffer attribute might help? The point is that I can't find an easy way to get the HTTPConnection instance from an urlopen().
msg151018 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-10 16:09
New changeset 01ef13e9e225 by Senthil Kumaran in branch '2.7':
- Issue #13642: Unquote before b64encoding user:password during Basic
http://hg.python.org/cpython/rev/01ef13e9e225
msg151019 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-01-10 16:47
Joonas and Michele - The fix along with the tests is in for 2.7 line. W.r.t to tests having it in the class level buf seems to be only easy way, for other it seemed to be that FakeSocket and FakeConnection stuff need some major change (and resulted in breaking of some tests). I thought it is better to push it in the current form as we use the buf for temporary storage and verification.
I think, those tests can be brought into 3.x line as well.
msg151239 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-14 07:44
Senthil, I ported the tests to 3.2. The quoting problem seems to be the same in 3.2 and the new test fails. I don't know how the password managers handle the usernames and passwords in python3 urllib so I did not look at that.

Could you please also port the fix to relevant python3 branches? I think you can do it much quicker since you are already familiar with both code bases and the fix.
msg151240 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-14 08:09
Also adding a patch that may be enough to fix the problem in python3.2. Review needed, did not test more than passing the previously failed unit test.
msg151245 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-14 11:13
New changeset 80e3b8de4edd by Senthil Kumaran in branch '3.2':
Fix Issue #13642: Unquote before b64encoding user:password during Basic Authentication.
http://hg.python.org/cpython/rev/80e3b8de4edd

New changeset 4b4029fc8cf2 by Senthil Kumaran in branch 'default':
merge from 3.2 - Fix Issue #13642: Unquote before b64encoding user:password during Basic Authentication.
http://hg.python.org/cpython/rev/4b4029fc8cf2
msg151246 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-01-14 11:15
Here we go! I thought the problem did not exist in py3k, but good that the tests caught them and we have a fix now.

Thanks for the complete patch, Joonas. I hope it was easy to port the patch to 3k. The encoding part may perhaps be the only thing to careful with and use of the new string format feature was good one.  Thanks!
msg151248 - (view) Author: Joonas Kuorilehto (joneskoo) Date: 2012-01-14 12:10
Updating the issue with version 3.2 tag since it was fixed there as well. Still fixed, of course.

You are correct that the encodings can be tricky. Luckily I only added coding to tests. But you're right, I would consider very carefully before using similar code outside tests.

I just realized what's the impact of this change on python3.2 really. Since urllib.request.urlopen (for some reason) does not allow username and password on the URI, it is not possible to hit this with that, I think. But FancyURLopener does allow using user-pass in url, so this bug was reachable. I just verified that and the fix :)

Fix does the trick for FancyURLopener when the username and password are passed in the URL.
History
Date User Action Args
2012-01-14 12:10:41joneskoosetmessages: + msg151248
versions: + Python 3.2
2012-01-14 11:15:55orsenthilsetstatus: open -> closed

messages: + msg151246
2012-01-14 11:13:21python-devsetmessages: + msg151245
2012-01-14 08:09:32joneskoosetfiles: + urllib-userpass-w-spaces-fix.patch

messages: + msg151240
2012-01-14 07:44:26joneskoosetfiles: + urllib-userpass-w-spaces.patch

messages: + msg151239
2012-01-10 16:47:17orsenthilsetresolution: fixed
messages: + msg151019
stage: patch review -> resolved
2012-01-10 16:09:53python-devsetnosy: + python-dev
messages: + msg151018
2012-01-10 10:48:07makersetfiles: + issue13642_with_map.patch
2012-01-10 10:47:52makersetfiles: - issue13642.patch
2012-01-10 10:47:37makersetfiles: + issue13642.patch

messages: + msg151010
2012-01-10 07:01:47joneskoosetmessages: + msg151005
2012-01-09 15:05:11makersetfiles: + issue13642.patch
nosy: + ezio.melotti
messages: + msg150939

2012-01-08 21:43:53makersetmessages: + msg150913
2012-01-06 14:53:48joneskoosetmessages: + msg150734
2012-01-06 10:53:51orsenthilsetassignee: orsenthil
messages: + msg150720
stage: patch review
2012-01-06 03:06:03joneskoosetfiles: + tests-and-fakehttp-request-storing-2.diff

messages: + msg150714
2012-01-06 02:27:29joneskoosetfiles: + tests-and-fakehttp-request-storing.diff
keywords: + patch
messages: + msg150710
2011-12-30 14:21:27makersetnosy: + maker
messages: + msg150378
2011-12-25 03:50:47jceasetnosy: + jcea
messages: + msg150245
2011-12-20 15:02:45joneskoocreate