-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cookie Colon Name Bug #46446
Comments
According to David M. Kristol, only comma, space and semi-colon are The fix would be to add a colon symbol into _LegalChars variable. |
Hm. Your bug doesn't agree with my interpretation of the RFC's. RFC2109 section 4.1 states that the the cookie name (attr) is of type ... so the ":" in a cookie name should not be allowed. RFC2965 and RFC2616 (which obsolete the above RFC's) state the same thing. Either I'm wrong or the other languages you cite are differing from the |
OK, I see and agree there are no actually that standard
As you can see, there is no 0x3a to be excluded. The snippet is: private boolean isToken(String value) {
int len = value.length();
for (int i = 0; i < len; i++) {
char c = value.charAt(i);
if (c < 0x20 || c >= 0x7f || tspecials.indexOf(c) != -1)
return false;
}
return true;
} I agree, Java is not a standard, but yet another (buggy) language. :-)
The difference between old format of cookies and new one is,
Hm... What do you think? :) |
Heh, I think I should not have gotten involved in this bug. :) I have a In response to 2.: David M. Kristol in that article is referring to the In response to 6: Yes, I was referring to the NAME part of NAME=VALUE... So it looks to me like the Python code implements the specification as Can some Python regulars comment on Python's policy in situations like this? I can throw together a patch if this change would likely be accepted. |
Well, as D.M.Kristol says: there are no any standard for this particular Personally I've been added a colon in Cookie.py for let Trac and other You also can find a lots of cookies that consists "[" and "]", slash, So I see another problem here: there is not just a colon thing, but Here is also an example how Ruby does the thing I still have doubts that Cookie NAME is the same token, because Would be nice to hear some comments on policies... Tim? |
Is this a bug or isn't it, so should it be behaviour or feature request or what? |
Looks like a bug. Here's the trac bug that this caused -- trac fixed their bug by working around this bug in a really ugly way: http://trac.edgewall.org/ticket/2256 It would be nice to notify the trac developers if/when this is fixed. This bug is probably not specific to colons (e.g. commas used to be valid in HTTP cookie strings, and still are as far as I know -- somebody should test what current browsers do to make sure). The set of characters regarded as legal is less important than the fact that parsing a Cookie header should *never* raise CookieError -- it should just ignore any invalid cookies. Still, IIRC there isn't any need to treat any of them as invalid, since more or less anything is a valid cookie (or was in the past -- as I say, maybe browsers have cleaned up since then, but I'd be surprised). |
I'll have a look. |
My Java may be a bit rusty, but it seems that it would filter out the colon. tspecials contains a colon and thus having a colon in the cookie name would make in invalid. I glanced at the Perl code and couldn't find where it filtered out any characters. Would it be better to file bugs against buggy implementations instead of changing Python's implementation to be more lenient? |
dstanek> Would it be better to file bugs against buggy implementations instead of changing Python's implementation to be more lenient? No. Another app running on the same domain that knows nothing about RFC 2109 (and why should it?) shouldn't break your Cookie.py-using app -- did you look at the trac bug? |
The various attempts by RFCs to codify HTTP cookies are useless and bear no resemblance to what browsers actually do. In the real world, every byte in the range 0x20-0x7E is allowed, except for the semicolon, the equals (in names), and in Opera, in some places, the double-quote. Many browsers even allow most of the control codes! The question of non-ASCII Unicode characters is tricky, but none of them cause a token break. Contrary to RFC2109 and its successors, no browser takes any notice of quoted-string cookies or backslash-escaping, so the effort Cookie.py puts into producing an encoded string and 'parsing' input cookies is completely wasted. It should do what everyone else does: split on semicolon, left-strip the whitespace, split each cookie on first equals. (In reality cookie names and values have no inherent encoding scheme, so if you want to include out-of-band characters like semicolon, control characters or non-ASCII characters you have to use an ad-hoc encoding scheme, often URL-encoding.) |
Seems like this really needs a strict and a lax mode. Perhaps a BrowserCookie class that implements the relaxed rules? That would make this a feature request, though, and so nothing would happen until 3.3, which would be unfortunate. It is certainly possible to create a more relaxed version for your own use. It seems to me that (untested): class BrowserCookie(BaseCookie):
def set(self, key, val, coded_val, LegalChars=_LegalChars+':'):
super().set(key, val, coded_val, LegalChars) would do most of what And Clover wants (not adding or stripping quotes or doing backslash quoting or encoding, accepting : in key names (and more characters could be added; the regex in the parsing step is fairly liberal)). Making further relaxations is difficult without poking in to module internals, though. |
The rules for parsing and setting the cookies are different. Server should always produce strict cookies only. So the production rules are to be done accordingly to the specification. Adam Barth is working right now on an update of the "HTTP State Management Mechanism" specification. See http://tools.ietf.org/html/draft-ietf-httpstate-cookie The name production rules are still defined in RFC2696 What browsers ignores or not in characters depends from browsers to browsers. (IETF server is down right now, and I can't link to the appropriate section for parsing the values.) |
Ah the server is back the rules for the User Agents are defined here |
karl: I'm not clear precisely what it is that you want to draw our attention to. Note this bug is about parsing of Cookie headers by servers, not production of Set-Cookie headers by servers. |
John: Ah sorry, if I misunderstood. The bug seems to say that it is about the Cookie Name and legal chars for this cookie name. What I was trying to say is that the processing of the Cookie Name is different depending if you are a client or a server *and* that there is a specification being developed by Adam Barth (part of browser vendors) to obsolete RFC 2109. In the case of Server sending to the Client The rules for production of the cookies must be strict. Always. aka the module is used for creating a cookie and indeed the "colon" character is forbidden. The "token" syntax for valid chars and invalid chars are defined now in RFC2696. It means that any US-ASCII characters EXCEPT those are authorized: control characters (octets 0-31) and DEL (octet 127) and, the following characters “(“, “)”, “<”, “>”, “@”, “,”, “;”, “:”, “", “/”, “[“, “]”, “?”, “=”, “{“, “}”, the double quote character itself, US-ASCII SP (octet 32) or the tabulation (octet 9) Then if you use the Cookie Module for a client it is not anymore the same story. In the case of Client storing the value of the cookie sent by a server. quote:
/quote then the algorithm is described. Which means that what the server will parse will not be necessary what the server have generated. Section 5.4 says how the Cookie Header should be sent to the server with an algorithm for what will receive the server. John, do you think there is a missing algorithm for parsing the value of cookie header when sent by the client? |
Again, I don't think this is relevant, because the bug is about servers parsing Cookie: headers. Note that that string (the value of the Cookie: header) may be generated by a different server than the server that parses it (see the trac example mentioned in the bug comments). |
agreed. :) Then my question about parsing rules for libraries. Is interoperability a plus here. |
Yes, interoperability is good. Do you have a specific concern about the change that I proposed? If not, and you're instead just trying to ensure conformance, by all means read the draft specification that you pointed out and look for reasons why my suggested change would be the wrong thing to do -- that would certainly be useful. Otherwise, it's hard to respond to non-specific "are you doing the right thing" questions with anything other than "yes" ;-) |
1 similar comment
Yes, interoperability is good. Do you have a specific concern about the change that I proposed? If not, and you're instead just trying to ensure conformance, by all means read the draft specification that you pointed out and look for reasons why my suggested change would be the wrong thing to do -- that would certainly be useful. Otherwise, it's hard to respond to non-specific "are you doing the right thing" questions with anything other than "yes" ;-) |
Personally I believe that this is WONTFIX. Why? Because, the original RFC states that the colon is part of the unwanted characters, regardless of whether Perl or other similar implementations ignore the standard. Besides that, and most important: The cookies are always set by the server or application thereof. Therefore, the server must behave just like what is stated in the original RFC. And it must also expect existing browsers to behave just like what is requested in the RFC. IMO, the original requester and supporters, both here and over on the trac issue tracker are simply not able to figure out a proper cookie tracking mechanism for marketing or whatever purposes. Or, perhaps, they want to exploit some unwanted behaviour in existing user agents, who knows. Besides that, if the original poster and follow up requesters supporting the issue persist on using a non standard implementation of the cookie library, hey, who cares. Let them figure out how to patch or rewrite the existing library, and how to include it with their favourite server/user agent exploiting implementation. And the same is true for the request on the trac issue tracker. Since the cookies are set by the server, there is no need to actually weaken the existing pseudo standard by incorporating ways to hierarchically define a cookie name other than what is already present in the scheme or could be accomplished using different characters other than those blacklisted. |
One more: if you look closer at the accepted patch by CMLENZ over @ t.e.o., you will find: if self.req.headers_in.has_key('Cookie'):
- self.incookie.load(self.req.headers_in['Cookie'])
+ #self.incookie.load(self.req.headers_in['Cookie'])
+ cookie = self.req.headers_in['Cookie']
+ old_set = self.incookie._BaseCookie__set
+ bad_cookies = []
+ def safe_set(key, real_value, coded_value):
+ try:
+ old_set(key, real_value, coded_value)
+ except CookieError:
+ bad_cookies.append(key)
+ dict.__setitem__(self.incookie, key, None)
+ # override Cookie.set to ignore cookies with parse errors
+ self.incookie._BaseCookie__set = safe_set
+ # load the cookie values
+ self.incookie.load(cookie)
+ # clean up the Cookie.set overriding
+ self.incookie._BaseCookie__set = old_set
+ for key in bad_cookies:
+ del self.incookie[key]
+ which will eventually delete all cookies that do not match the original production rule. Besides that, the original poster of the issue forgot to properly limit the cookies set by the other site to just a single host path, so these invalid cookies got routed to the trac instance running on some different host. |
@carsten.klein: there is no such thing as an “original RFC”. The RFCs that have been produced on the subject of cookies, 2109 and 2965, were drawn up long after user-agents implemented cookies. Their attempts to clean up the warts of cookies and implement new features have completely failed. Their strictures are not obeyed by user agents; they are irrelevant and should not be used as the basis for any server-side cookie implementation. The nearest to an original standard for cookies is the Netscape cookie-spec (see eg http://curl.haxx.se/rfc/cookie_spec.html), which is far too woolly to really count as a real specification, but which allows all but |
@aclover If you have comments about this RFC you are welcome to add comment on freenode at #whatwg. |
I agree with And Clover that Carsten Klein's comments in #msg127366 are not correct, for the reason that And stated. Also, Carsten repeats again the idea that the trac issue is about the trac server failing to generate appropriate cookies -- but that issue was in fact about trac's processing of cookies generated by servers other than trac. In that particular case, the other server was probably under the control of the same group of people, but that isn't true in general. Please read all the comments before adding more comments. Re #msg127368: I don't think the patch applied to trac is relevant here. "Somebody" should write a patch for module Cookie, and then it can be reviewed. |
if you'd take a close look at the following lines accepted as part of the patch for stripping out unwanted/non standard cookies over trac: + try: then you will find that trac actually behaves just like what is requested in the "original" RFC, namely that the colon is part of the reserved or special characters not meant for inclusion with the cookie name or, as it was stated in the referred rfc, the token. Please do not fix. |
Besides that, BM is wrong in the assumption that *who ever he is* Davi M. Kristol states that the colon is a valid character. |
To Carsten Klein: ---------------------------- NAME=VALUE
NAME is the cookie’s name, and VALUE is its value. Thus the header Set-Cookie: id=waldo sets a cookie with name id and value waldo. Both the cookie NAME and its VALUE may be any sequence of characters except semi-colon, comma, or whitespace. In the above it says "any sequence of characters" EXCEPT a three characters:
In English this means that "any sequence of characters" INCLUDES a colon and thus colon IS a valid character. BTW, this stupid bug is three years old, while the rest of the world implemented it right (Java, Ruby etc). Also Python implementation of this part is at least... strange (being polite here). Because instead of excluding illegal chars, they actually going opposite by including the entire world and then going mad in the whole code inside... :-( |
Guess you are right... I did overlook the quoted-string reference in the RFC: av-pair = attr ["=" value] ; optional value The actual production rules for quoted-string are not specified, so I guess that anything resembling unicode data would be allowed in that string provided that: [...] from RFC 2109 In Netscape's original proposal, the values in attribute-value pairs in RFC 2965 there is no such notice... However, and this is important, the cookie values not matching token must be quoted by the origin server upon setting and the client must return these as quoted strings as well. |
Ups forgot to also mention the production rule for token, which is defined in the HTTP RFC RFC2616: token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT and here it clearly states that a value that is not a quoted string |
Perhaps the best solution would be for the Python cookie module to |
First, I agree with others who say that RFCs are basically irrelevant for cookies. For Django we've discovered this in various ways e.g. bpo-9824 - http://bugs.python.org/issue9824 - which has now been applied. We have also had to work around the stdlib behaviour here. Second, I have implemented a patch for this, with tests, against trunk - please review. After looking at the implementation, this seems like the best way to make Python conservative in what is produces and liberal in what it accepts, which seems to be what the thread converged on. BaseCookie will now silently discard cookie 'morsels' with a colon in their name (and all other irregularities) when loading from a string, rather than raise an exception. This allows cookie parsing to continue, so that other cookies in the HTTP header will be found. However, if in Python code you attempt to directly set a morsel with an illegal name, you will still get the error. There is a more lax strategy: Simply add ':' to the _LegalChars variable. This would allow morsels to be *read* that have a colon in their name. However, from the current implementation, it would be very hard to add that ability without also allowing the BaseCookie class to produce such cookies. This would also raise other issues about at what point an error should be raised for setting invalid cookies etc. Also, allowing these illegal cookies to be read is a corner case that is much less important - it isn't needed either for Trac or for our needs in Django. For these reasons, I decided against the more lax strategy. |
Same patch backported to python 2.7 branch |
Found a bug with patch - this supersedes old one. |
Same against Python 2.7 |
did you have the opportunity to look at http://greenbytes.de/tech/webdav/rfc6265.html If there is something which doesn't match reality in that document that would be cool to have feedback about it. |
Thanks for taking a crack at this. IMO the thing that needs to be fixed here is that receiving an invalid cookie makes it difficult to receive the valid cookies. I'd love to accept your patch, but "silently ignore" sounds like a bad idea and is something we try to avoid in Python where practical. In the email package we have a mechanism for reporting RFC violations (the defects attribute). And we make as much sense of the input as we can, even if it is invalid, and preserve it. We even allow generation of some RFC-invalid stuff, though we have plans to make applications work harder to do that in the future. So I think your harder fix (accept the RFC-invalid cookies as long as there is some sensible way to parse them, but reject creating them) is the right approach. I could also see the possibility of accepting a feature request for adding the ability to explicitly create RFC-invalid cookies, if someone can demonstrate a use case for doing so. Other possible follow-on feature requests would be a 'defects' facility and/or a way to explicitly request that non-compliant cookies be ignored. Note that in suggesting we reject creating RFC-invalid cookies by default I am not expressing an opinion on the relevance of the RFCs to cookie processing "in the wild". Given that there is a standard and we are talking about what to do on generation, the obvious answer (by Postel's law) is that we ought to generate standards-compliant cookies. Also note that I have not looked at the referenced rfc (here is what I believe is the official link: http://tools.ietf.org/html/rfc6265). |
I had a quick look, and there are these relevant bits: << There are two audiences for this specification: developers of cookie-generating servers and developers of cookie-consuming user agents. >> And: << To maximize interoperability with user agents, servers should limit themselves to the well-behaved profile defined in Section 4 when generating cookies. >> So, the document doesn't tell servers how to parse cookies, only how to generate them. With regards to generation, there is basically no change - we still disallow programmers to set cookie names that are not a 'token', as defined by section 4 of that document, which is the same as RFC 2109 in terms of valid cookie names if you look at it. It is not obvious to me that Python's BaseCookie implementation obeys RFC 2109 (due to the way character lists are defined in the opposite way), but if you believe the comments in the module then it does. I haven't read the rest of RFC 6265 and checked BaseCookie against it - that would be a much bigger job. But with respect to the change in my patch, it looks like we are all OK. |
@ David Murray: Thanks for taking the time to look at this - can I trouble you to keep going and read my response? Thanks. You wrote:
I agree absolutely, and my patch implements exactly that aim. So I don't understand why the rest of your reply goes on to assume a very different goal - handling RFC-invalid cookies such that we can read their values.
"silently ignore" is what the current BaseCookie implementation does for **every other** type of invalid input, with the only exception (I can find) being the case where everything is correct apart from the name: >>> from Cookie import SimpleCookie
>>> c = SimpleCookie()
>>> c.load('rubbish')
>>> c
<SimpleCookie: >
>>> c.output()
''
>>> c.load('more:rubbish')
>>> c
<SimpleCookie: >
>>> c.load('name=value')
>>> c
<SimpleCookie: name='value'>
>>> c.load('name=value; invalidattribute;')
>>> c.output()
'Set-Cookie: name=value'
>>> c.load('xyz123"sfs;;=-abc')
>>> c
<SimpleCookie: name='value'>
>>> c.load('namewith:colon=value')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/Cookie.py", line 632, in load
self.__ParseString(rawdata)
File "/usr/lib/python2.7/Cookie.py", line 665, in __ParseString
self.__set(K, rval, cval)
File "/usr/lib/python2.7/Cookie.py", line 585, in __set
M.set(key, real_value, coded_value)
File "/usr/lib/python2.7/Cookie.py", line 460, in set
raise CookieError("Illegal key value: %s" % key)
Cookie.CookieError: Illegal key value: namewith:colon As you said, I think this ticket is about fixing that last one, which is the gross exception to the rule, and the only thing that is causing major problems for people. I agree that handling cookies with invalid names so that we can read them would be nice, but I think it is **very** difficult to find a rationale for saying that this bug fix should have to wait for that change. In addition, I do not think there is any sensible way to implement that suggestion given the API of BaseCookie at the moment - it's not just the implementation I baulked at, it is the API of BaseCookie that works against you every step of the way. The API of BaseCookie works like this in the typical case: Consuming: input -> load() -> store in BaseCookie -> __getitem__() Generating: input -> __setitem__() -> store in BaseCookie -> output() (Of course you don't have to do it that way, but looking at the docs and examples, http://docs.python.org/library/cookie.html, it's very clear that it's meant to be used that way). The fact that both modes involves storing in BaseCookie really messes up any attempt to make these two work differently, and means you would have a really surprising and strange API whichever way you do it. So, option (1): you could allow BaseCookie to store invalid cookie names if they come via load, but not via __setitem__(). This means that you've got an easy work-around for BaseCookie refusing to store your value - just use 'load()' instead. It also means that simple code like this fails:
which can be a problem if you are trying to do some generalised processing of the contents of a BaseCookie. This leads us to option (2): allow RFC-invalid cookies to be stored, but then quietly sanitise (i.e. discard) in output(). But this becomes a much worse example of "silently ignoring" - the former is tolerant handling of data that is outside the programmers control - Postel's law. But this would be accepting incorrect data from a programmer, and then silently discarding it, is what we should do our utmost to avoid. Neither of these options is any good, and it is because the API of BaseCookie was just not designed for it. The only sensible things to do, given our API, is sanitise on input. We could have an 'RFC-invalid' mode, but it would have to be turned on for the whole Cookie instance, and it would be a feature addition. An alternative implementation would be a 'badmorsels' attribute which preserves the rubbish where possible, in case you want it - but again, it's a feature addition. Finally, I think the behaviour you are aiming at is unreasonable to ask for, especially in this ticket. You are essentially asking for a tolerant kind of cookie parsing which does its best to do 'do-what-I-mean'. But:
Do we really have to wait for all of that to be sorted out before we fix the glaring inconsistency in the way that BaseCookie.load() deals with some kinds of invalid data? Thanks! Luke |
Well, it's been a while since I looked at the cookie code, and I didn't review it when I reviewed your patch. But some time after I hit submit on my message the issue about some stuff getting ignored anyway occurred to me. So given that we are *already* in that situation, applying your patch to avoid bailing on only some invalid cookies makes sense to me. |
David, Thanks again for the time on this. Can I push to get the patches included, or is there work that still needs to be done on the patches now that the idea is accepted in principle? I did experiment with a few approaches to implement, and it seemed like the most sensible. Thanks, Luke |
I'm flat out right now on other projects. But if no one else gets around to doing a final review and commit I should be able to get to it by the end of October. If I don't, please ping me by posting here again. |
Ping. |
I tested setting cookies with ":" in the cookie name in both firefox and google-chrome. They both seem to allow and store the cookie with ":" in them. Set-Cookie test:value=solution:is:he the cookie with name containing ; or , failed. The patched attached tries to silence the error with SimpleCookie. I am not sure how far it is going to help. I believe, we could just allow it from 3.3 onwards or create a new BrowserCookie class which is more lenient than SimpleCookie. As far I see, the allowing ":" in the Legalchars seem to have met with negative votes because it was not in RFC, but well it seems some kind of de-facto acceptable behaviour with browsers, so having from a new version may not cause any harm. +1 to that. I can modify the tests from the attached patch. |
New changeset 57ec2e6cd70a by Senthil Kumaran in branch 'default': |
I tested with apache to set ":" in names and then verified the behavior in browsers and it looks like it fine to allow ":" as legal character in cookie name ( though RFC originally does say that). My guess is previously it could have been thought that ":" might hinder parsing, but does not seem so as how Cookie name=value;name2=values2 have evolved. So in 3.3, I have made the change to just allow ":" in Cookie Name. But as previous versions raised CookieError error, I see this is a change in behavior and it should not be back-ported. In Docs 2.7,3.2 etc, I see the mention that users should look for and catch CookieError if they are capturing cookies from unknown sources, in case if it contains any illegal characters. That still applies to other illegal characters. I think, the docs could just be updated with mention of illegal characters in python versions to be little more helpful and with that, this issue can be closed. |
New changeset d3b0845a9253 by Senthil Kumaran in branch '3.2': New changeset 8cae3ee7f691 by Senthil Kumaran in branch 'default': |
just wanted to note that I agree with BM's comment and that I had to change LegalChars myself to include the slash '/' in order for my application to work properly. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: