classification
Title: cookielib doesn't handle URLs with / in parameters
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: BreamoreBoy, andyk, gregory.p.smith, jjlee, orsenthil, r.david.murray, tseaver
Priority: normal Keywords: needs review, patch

Created on 2008-08-27 17:01 by andyk, last changed 2010-07-25 20:00 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
cookielib.py andyk, 2008-08-27 17:01 cookielib.py from 2.5.2 with my suggested fix
cookielib.diff gregory.p.smith, 2008-08-28 06:10 a diff generated from the above file
cookielib-querystring-fix.patch gregory.p.smith, 2008-08-28 06:39 patch with the fix and a unit test
issue3704-better_request_path.patch tseaver, 2010-05-02 20:37 Make request_path give us what we really want
issue3704.patch jjlee, 2010-05-10 19:44 request_path return value never includes query
Messages (25)
msg72035 - (view) Author: Andy Kilpatrick (andyk) Date: 2008-08-27 17:01
cookielib doesn't handle URLs like "http://server/script?
err=/base/error.html&ok=/base/ok.html", as 
CookieJar::_cookie_from_cookie_tuple uses rfind("/") to strip off the 
end of the URL, returning "http://server/script?
err=/base/error.html&okc=/base" instead of "http://server/script".

My suggested fix (attached, line 1465-1468) is to first strip off 
anything after "?" if present, then continue as with existing code.
msg72066 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-08-28 06:39
attached is a patch with the suggested fix along with a unit test.
msg72724 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-09-07 00:38
The patch and tests look fine to me, Gregory. I verified it against the
trunk. Should not we have it for py3k as well?
msg72726 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-09-07 00:57
yep it applies to all releases.  anyways, it won't make 2.6/3.0 but it
can be put into 2.5.3/2.6.1/3.0.1.
msg74824 - (view) Author: John J Lee (jjlee) Date: 2008-10-15 22:48
Do we have an RFC 3986 URI parser in the stdlib now?  It would be better
to use that if so, but I don't see one.  Failing that, an implementation
of the relevant part of that RFC is only about four lines of code, so
that would be better than naively looking for "?" (request_path should
probably be changed at the same time).

I'll try and add a patch that does that and check what Firefox does to
see if I agree it's a bug.
msg74827 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-10-16 03:20
John, issue3647 tries relative url parsing and joins to be RFC3986
compliance.
msg104803 - (view) Author: Tres Seaver (tseaver) * Date: 2010-05-02 20:37
I can confirm that the patch applies cleanly to the release26-maint
branch, and that the updated test fails without the updated
implementation.

However, the entire approach seems wrong to me:  the patched method
has just called 'request_path', which already cracked the URL using
urlparse.urlparse and put it back together again with
urlparse.urlunparse.

A better fix would be either to call urlparse.urlparse directly, or
else to add an argument to 'request_path' which kept it from doing the
unwanted reassembly.

Attaching a patch which implements the latter strategy.  Greg's new
test passes with this patch applied in place of the part of his patch
which touches Lib/cookielib.py.
msg105407 - (view) Author: John J Lee (jjlee) Date: 2010-05-09 18:58
It looks to me that it's just request_path that's wrong, so no need to add extra arguments to that function.  It should discard the query and fragment (still keeping the "parameters" -- using urlparse.urlsplit instead of urlparse.urlparse would make that simpler).

request_path is only called in three places:

 * We're agreed that the default cookie path should omit the query (and fragment)
 * Netscape cookies aren't checked for path on setting cookies (.set_ok_path()), so the value of request_path isn't checked in that case
 * Netscape cookies are checked for path on returning cookies, but including the query & fragment will never make a difference to the .startswith check in .path_return_ok()

Finally, even RFC 2965, which nobody cares about, and which does include the path check on setting the cookie, refers to RFC 2396 for the definition of request-URI, and both RFC 2396 and RFC 3986, which obsoletes it, agree that the path doesn't include the query (nor the fragment).

Incidentally: the request_path function docstring claims to return the request-URI, but obviously the docstring should say it returns the path component of the request-URI.
msg105424 - (view) Author: Tres Seaver (tseaver) * Date: 2010-05-10 04:26
As long as we don't care about preserving backward compatibility, we
could indeed just change the behavior of 'request_path'.  It isn't
documented as an API of the cookielib module, but it does have a
docstring which promises certain semantics.
msg105433 - (view) Author: John J Lee (jjlee) Date: 2010-05-10 12:57
I'll upload a patch when I'm back home (bugs.python.org went down yesterday).

Will turn docstring into comment.
msg105458 - (view) Author: John J Lee (jjlee) Date: 2010-05-10 19:39
Just re-read your comment, Tres.  Since when do docstrings determine whether a stdlib function is public?  If it's documented in the docs, it's public.  If not, it's not.  This function isn't, so it's not public.  It's also not in __all__, FWLTW.
msg105459 - (view) Author: John J Lee (jjlee) Date: 2010-05-10 19:45
Didn't bother changing docstring to comment, since that would be inconsistent with rest of module.
msg105460 - (view) Author: John J Lee (jjlee) Date: 2010-05-10 19:54
FWIW, the "certain semantics" that request_path "promises" are 1. that it returns the RFC 2965 request-URI (which has never been true -- it returns the path component of the request-URI instead) and 2. that that request-URI is as defined in RFC 2965, and this bug is about fixing the function so that that's true for the case where the URI has a query component.  So there's absolutely no reason for not changing the function.
msg105500 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-11 11:40
There is a reason, and that is that it may break existing code in the field relying on the current behavior.  This is (unfortunately) true regardless of whether the function is public or private, though the fact that it is ostensibly private is likely to reduce the amount of breakage.  The fix needs to be evaluated to try to guess whether breakage is likely, and if it is, it would not be a candidate for backport to 2.6 or 3.1.
msg105501 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-11 11:52
Hmm.  I didn't read your comment carefully enough before I replied.  I think you are saying that the bug fix is confined to the routine in question and doesn't change even its API, in which case the nature of the function doesn't come in to it at all.  The fix still needs to be evaluated for backport, though, since it is a behavior change.  (For example, in another issue Senthil fixed parsing of unknown schemes to be RFC compliant, but we may need to back that fix out of 2.6 since it changes behavior and will most likely break currently working 2.6 code).
msg105535 - (view) Author: John J Lee (jjlee) Date: 2010-05-11 20:05
What specific breakage do you expect resulting from my patch being backported?

There is no behaviour change here, except to the minimal extent that all bug fixes involve behaviour change.  This seems a clear-cut backport candidate.

It's not a surprise to me that changing module urlparse to be compliant with RFC 3986 breaks code -- I suggested adding a new module instead for that reason.
msg105549 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-11 21:18
I don't expect anything; I had written that it looked OK to me but apparently I accidentally deleted that text before posting.  But I'm not someone who has ever programmed using cookielib so I wouldn't expect my opinion to count for too much.
msg110684 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-18 20:31
Could we please have an update from people who have been involved on this issue as to whether it can be taken forward, closed as to no longer relevant, or whatever.
msg110799 - (view) Author: John J Lee (jjlee) Date: 2010-07-19 19:06
My patch should be applied.
msg110806 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-19 20:25
Patched the unit test, then ran the test before applying the fix which failed, after applying the fix the test ran successfully.  Tested on Windows Vista 32 bit against 2.7 maintainance release.  The patches are short and sweet, I see no reason why they can't go forward.
msg110837 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-07-19 23:19
jjlee's issue3704.patch has been committed to py3k (3.2) in r82985.  It could still use backporting to 2.6, 2.7 and 3.1.
msg110944 - (view) Author: John J Lee (jjlee) Date: 2010-07-20 18:31
Hmm, I never tested with Python 3, though I assume the forward-port was straightforward.  The patch was created against (2.x) trunk, so indeed it should be committed there also.

Deselecting 2.6 since I assume no more maintenance backports will be made to 2.6 aside from security fixes -- right?
msg110984 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-21 01:16
Until 2.6.6 is released bug fixes can still be backported to 2.6 but it is at the committer's option.  Most likely this one won't be.
msg111463 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-24 11:56
Can we have the patch committed to 2.7 and 3.1 please.
msg111556 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-07-25 20:00
release27-maint r83145
release31-maint r83146
release26-maint r83147
History
Date User Action Args
2010-07-25 20:00:15gregory.p.smithsetstatus: open -> closed

messages: + msg111556
2010-07-24 11:56:11BreamoreBoysetmessages: + msg111463
2010-07-21 01:16:33r.david.murraysetmessages: + msg110984
2010-07-20 18:31:48jjleesetmessages: + msg110944
versions: - Python 2.6
2010-07-19 23:19:55gregory.p.smithsetresolution: accepted
messages: + msg110837
versions: - Python 3.2
2010-07-19 20:25:56BreamoreBoysetmessages: + msg110806
2010-07-19 19:06:53jjleesetmessages: + msg110799
2010-07-18 20:31:41BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110684
2010-05-11 21:18:08r.david.murraysetmessages: + msg105549
2010-05-11 20:05:22jjleesetmessages: + msg105535
2010-05-11 11:52:04r.david.murraysetmessages: + msg105501
2010-05-11 11:40:09r.david.murraysetnosy: + r.david.murray

messages: + msg105500
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 3.0
2010-05-10 19:54:02jjleesetmessages: + msg105460
2010-05-10 19:45:29jjleesetmessages: + msg105459
2010-05-10 19:44:50jjleesetfiles: + issue3704.patch
2010-05-10 19:39:55jjleesetmessages: + msg105458
2010-05-10 12:57:14jjleesetmessages: + msg105433
2010-05-10 04:26:16tseaversetmessages: + msg105424
2010-05-09 18:58:33jjleesetmessages: + msg105407
2010-05-02 20:37:34tseaversetfiles: + issue3704-better_request_path.patch
nosy: + tseaver
messages: + msg104803

2009-02-13 01:30:16ajaksu2setstage: patch review
components: + Library (Lib), - None
versions: - Python 2.5
2008-10-16 03:20:13orsenthilsetmessages: + msg74827
2008-10-15 22:48:10jjleesetnosy: + jjlee
messages: + msg74824
2008-09-07 00:57:24gregory.p.smithsetmessages: + msg72726
versions: + Python 3.0
2008-09-07 00:38:11orsenthilsetnosy: + orsenthil
messages: + msg72724
2008-09-02 05:38:31gregory.p.smithsetversions: + Python 2.6
2008-08-28 06:39:54gregory.p.smithsetfiles: + cookielib-querystring-fix.patch
nosy: + gregory.p.smith
messages: + msg72066
priority: normal
assignee: gregory.p.smith
keywords: + needs review
2008-08-28 06:10:26gregory.p.smithsetfiles: + cookielib.diff
keywords: + patch
2008-08-27 17:01:40andykcreate