classification
Title: Cookie Colon Name Bug
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: BM, BreamoreBoy, Eyal.Lewisohn, aclover, akuchling, carsten.klein, dstanek, georg.brandl, jerry.seutter, jjlee, karlcow, orsenthil, python-dev, r.david.murray, serhiy.storchaka, spookylukey, tim.peters
Priority: critical Keywords: patch

Created on 2008-02-26 02:27 by BM, last changed 2012-12-04 11:55 by Eyal.Lewisohn. This issue is now closed.

Files
File name Uploaded Description Edit
issue2193_patch_2_trunk.diff spookylukey, 2011-06-29 14:31 patch against trunk review
issue2193_patch_2_python27.diff spookylukey, 2011-06-29 14:31 patch against 2.7 review
Messages (48)
msg63023 - (view) Author: BM (BM) Date: 2008-02-26 02:27
According to David M. Kristol, only comma, space and semi-colon are 
forbidden in the cookie Name. However, Python's Cookie.py rejects a 
colon too. At the same time, Java Cookie in the servlet implementation 
allows a colon and Perl too.

The fix would be to add a colon symbol into _LegalChars variable.
msg63037 - (view) Author: Jerry Seutter (jerry.seutter) * (Python committer) Date: 2008-02-26 09:46
Hm.  Your bug doesn't agree with my interpretation of the RFC's.

RFC2109 section 4.1 states that the the cookie name (attr) is of type
"token", which in RFC2068 section 2.2 is defined as any chars excluding
control characters and special characters.  RFC2068 lists special
characters as 
tspecials      = "(" | ")" | "<" | ">" | "@"
                         | "," | ";" | ":" | "\" | <">
                         | "/" | "[" | "]" | "?" | "="
                         | "{" | "}" | SP | HT

... so the ":" in a cookie name should not be allowed.

RFC2965 and RFC2616 (which obsolete the above RFC's) state the same thing.

Either I'm wrong or the other languages you cite are differing from the
RFC.  Do you have any idea why they might be doing that?
msg63038 - (view) Author: BM (BM) Date: 2008-02-26 12:18
OK, I see and agree there are no actually that standard 
that we can call as a standard. But let me try to put in 
the other way again:

1. This documentation refers to the same RFC2109:
http://docs.python.org/lib/module-Cookie.html
But the RFC is slightly older than next David's edition.


2. David M. Kristol's cookie overview also says that only comma, 
semi-column and a space is not allowed. 
Here you go: http://arxiv.org/abs/cs.SE/0105018


3. Java implements the *same* RFC2109 but supports a colon too, 
as oppose to Python version. Here is the link to the source of
Tomcat 6 (the latest one):
http://www.google.com/codesearch?
hl=en&q=show:okuSsOjruck:iKnUOb7eVzc:kvBYp8tS5ms&sa=N&ct=rd&cs_p=ftp://apache.mirrors.pair.com/tomcat/tomcat-
6/v6.0.10/src/apache-tomcat-6.0.10-src.zip&cs_f=apache-tomcat-6.0.10-
src/java/javax/servlet/http/Cookie.java&start=1

As you can see, there is no 0x3a to be excluded. The snippet is:
-----------------------------------------------
private static final String tspecials = ",; ";

private boolean isToken(String value) {
    int len = value.length();
    for (int i = 0; i < len; i++) {
        char c = value.charAt(i);
        if (c < 0x20 || c >= 0x7f || tspecials.indexOf(c) != -1)
            return false;
        }
    return true;
}
-----------------------------------------------
I agree, Java is not a standard, but yet another (buggy) language. :-)
Still it means something...


4. Perl module from CPAN does the same and allows a colon.
http://search.cpan.org/~gaas/libwww-perl-5.808/lib/HTTP/Cookies.pm

5. You probably refer to the old Netscape specs 
(http://wp.netscape.com/newsref/std/cookie_spec.html) that 
for instance allows to contain an unquoted "," in the expires 
field, so usually new parser have to use special ad-hoc way to 
get it right. 

The difference between old format of cookies and new one is,
that cookie name begins with a $. So the old format expects 
these cookies to be separated by semi-colon, not comma. 


6. I am not very sure that tokens you are talking about are
referring to NAME of Set-Cookie NAME=VALUE pair. Because the
same section allows a white space between tokens, while it is
not very true. Moreover, braces etc *are* allowed. The reason
why comma, space and semi-colon are disallowed, because of
parser should know where it what. Other symbols parsers does
not care...


7. Maybe we should ask D.Kristol for this after all. :-)


Hm... What do you think? :)
msg63075 - (view) Author: Jerry Seutter (jerry.seutter) * (Python committer) Date: 2008-02-27 19:20
Heh, I think I should not have gotten involved in this bug. :)  I have a
few comments:

In response to 2.: David M. Kristol in that article is referring to the
original Netscape cookie implementation which is somewhat different from
what is set out in the RFC's.   Now I understand where the difference is.

In response to 6: Yes, I was referring to the NAME part of NAME=VALUE...


So it looks to me like the Python code implements the specification as
set out in the RFC.  I agree with you that there are multiple
implementations (Java and Perl) in wide use that allow behavior not set
out in the RFC.  The question is, do we want to stick the bahavior in
the RFC, or accomodate existing implementations?  My opinion:  I agree
with you.

Can some Python regulars comment on Python's policy in situations like this?

I can throw together a patch if this change would likely be accepted.
msg63086 - (view) Author: BM (BM) Date: 2008-02-28 02:10
Well, as D.M.Kristol says: there are no any standard for this particular 
topic. And RFC is not any standard but a request for comments...

Personally I've been added a colon in Cookie.py for let Trac and other 
Python-based software stop crashing, because such sort of cookies are quite 
often appears. For some reason people treat a colon as a namespace separator, 
like in XML. Thus there are plenty of cookies like "section:key=value" you 
can meet quite often. But this is not a fix, but just quick local fix.

You also can find a lots of cookies that consists "[" and "]", slash, 
even a space that has been quoted as "%20", which means a "%" inside the
token -- just look what godaddy.com gives to you. :-)

So I see another problem here: there is not just a colon thing, but
implementation should be slightly different. Currently Python code
lists allowed chars. I am not sure it is correct way to implement.
Rather I would list disallowed chars (see example in Java).

Here is also an example how Ruby does the thing 
(see lib/webrick/cookie.rb):
------------------------------------------
def self.parse(str)
      if str
        ret = []
        cookie = nil
        ver = 0
        str.split(/[;,]\s+/).each{|x|  # <--- Here you go.
          key, val = x.split(/=/,2)
          val = val ? HTTPUtils::dequote(val) : ""
          case key
          when "$Version"; ver = val.to_i
          when "$Path";    cookie.path = val
          when "$Domain";  cookie.domain = val
          when "$Port";    cookie.port = val
          else
            ret << cookie if cookie
            cookie = self.new(key, val)
            cookie.version = ver
          end
        }
        ret << cookie if cookie
        ret
      end
    end
------------------------------------------

I still have doubts that Cookie NAME is the same token, because
specs still disallows only comma, semi-colon and a space. Well, 
I don't know, but we can either stick to Request For Comments or 
we can behave as the rest of the world, avoiding future interference.

Would be nice to hear some comments on policies... Tim?
msg116979 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-09-20 21:29
Is this a bug or isn't it, so should it be behaviour or feature request or what?
msg117117 - (view) Author: John J Lee (jjlee) Date: 2010-09-21 22:32
Looks like a bug.  Here's the trac bug that this caused -- trac fixed their bug by working around this bug in a really ugly way:

http://trac.edgewall.org/ticket/2256

It would be nice to notify the trac developers if/when this is fixed.

This bug is probably not specific to colons (e.g. commas used to be valid in HTTP cookie strings, and still are as far as I know -- somebody should test what current browsers do to make sure).  The set of characters regarded as legal is less important than the fact that parsing a Cookie header should *never* raise CookieError -- it should just ignore any invalid cookies.  Still, IIRC there isn't any need to treat any of them as invalid, since more or less anything is a valid cookie (or was in the past -- as I say, maybe browsers have cleaned up since then, but I'd be surprised).
msg117171 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-09-23 06:45
I'll have a look.
msg119104 - (view) Author: David Stanek (dstanek) Date: 2010-10-19 02:13
My Java may be a bit rusty, but it seems that it would filter out the colon. tspecials contains a colon and thus having a colon in the cookie name would make in invalid.

I glanced at the Perl code and couldn't find where it filtered out any characters.

Would it be better to file bugs against buggy implementations instead of changing Python's implementation to be more lenient?
msg119167 - (view) Author: John J Lee (jjlee) Date: 2010-10-19 21:12
dstanek> Would it be better to file bugs against buggy implementations instead of changing Python's implementation to be more lenient?

No.  Another app running on the same domain that knows nothing about RFC 2109 (and why should it?) shouldn't break your Cookie.py-using app -- did you look at the trac bug?
msg119312 - (view) Author: And Clover (aclover) Date: 2010-10-21 15:55
The various attempts by RFCs to codify HTTP cookies are useless and bear no resemblance to what browsers actually do.

In the real world, every byte in the range 0x20-0x7E is allowed, except for the semicolon, the equals (in names), and in Opera, in some places, the double-quote. Many browsers even allow most of the control codes! The question of non-ASCII Unicode characters is tricky, but none of them cause a token break.

Contrary to RFC2109 and its successors, no browser takes any notice of quoted-string cookies or backslash-escaping, so the effort Cookie.py puts into producing an encoded string and 'parsing' input cookies is completely wasted. It should do what everyone else does: split on semicolon, left-strip the whitespace, split each cookie on first equals.

(In reality cookie names and values have no inherent encoding scheme, so if you want to include out-of-band characters like semicolon, control characters or non-ASCII characters you have to use an ad-hoc encoding scheme, often URL-encoding.)
msg124178 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-17 03:33
Seems like this really needs a strict and a lax mode.  Perhaps a BrowserCookie class that implements the relaxed rules?

That would make this a feature request, though, and so nothing would happen until 3.3, which would be unfortunate.

It is certainly possible to create a more relaxed version for your own use.  It seems to me that (untested):

class BrowserCookie(BaseCookie):

    def set(self, key, val, coded_val, LegalChars=_LegalChars+':'):
        super().set(key, val, coded_val, LegalChars)

would do most of what And Clover wants (not adding or stripping quotes or doing backslash quoting or encoding, accepting : in key names (and more characters could be added; the regex in the parsing step is fairly liberal)).

Making further relaxations is difficult without poking in to module internals, though.
msg125412 - (view) Author: karl (karlcow) Date: 2011-01-05 06:06
The rules for parsing and setting the cookies are different. Server should always produce strict cookies only. So the production rules are to be done accordingly to the specification.

Adam Barth is working right now on an update of the  "HTTP State Management Mechanism" specification. See http://tools.ietf.org/html/draft-ietf-httpstate-cookie

The name production rules are still defined in RFC2696

What browsers ignores or not in characters depends from browsers to browsers. (IETF server is down right now, and I can't link to the appropriate section for parsing the values.)
msg125413 - (view) Author: karl (karlcow) Date: 2011-01-05 06:08
Ah the server is back the rules for the User Agents are defined here 
http://tools.ietf.org/html/draft-ietf-httpstate-cookie#section-5
msg125417 - (view) Author: John J Lee (jjlee) Date: 2011-01-05 12:14
karl: I'm not clear precisely what it is that you want to draw our attention to.  Note this bug is about parsing of Cookie headers by servers, not production of Set-Cookie headers by servers.
msg125423 - (view) Author: karl (karlcow) Date: 2011-01-05 13:23
John: Ah sorry, if I misunderstood. The bug seems to say that it is about the Cookie Name and legal chars for this cookie name. What I was trying to say is that the processing of the Cookie Name is different depending if you are a client or a server *and* that there is a specification being developed by Adam Barth (part of browser vendors) to obsolete RFC 2109. 

In the case of Server sending to the Client
    Set-Cookie: Name=Value

The rules for production of the cookies must be strict. Always. aka the module is used for creating a cookie and indeed the "colon" character is forbidden. The "token" syntax for valid chars and invalid chars are defined now in RFC2696. It means that any US-ASCII characters EXCEPT those are authorized: 

control characters (octets 0-31) and DEL (octet 127) and, the following characters “(“, “)”, “<”, “>”, “@”, “,”, “;”, “:”, “", “/”, “[“, “]”, “?”, “=”, “{“, “}”, the double quote character itself, US-ASCII SP (octet 32) or the tabulation (octet 9)

Then if you use the Cookie Module for a client it is not anymore the same story.


In the case of Client storing the value of the cookie sent by a server.
See the section "5.2.  The Set-Cookie Header", 
http://tools.ietf.org/html/draft-ietf-httpstate-cookie-20#section-5.2

quote:
  
    If the user agent does not ignore the Set-Cookie header
    field in its entirety, the user agent MUST parse the
    field-value of the Set-Cookie header field as a
    set-cookie-string (defined below).
    
    NOTE: The algorithm below is more permissive than the
    grammar in Section 4.1. For example, the algorithm strips
    leading and trailing whitespace from the cookie name and
    value (but maintains internal whitespace), whereas the
    grammar in Section 4.1 forbids whitespace in these
    positions. User agents use this algorithm so as to
    interoperate with servers that do not follow the
    recommendations in Section 4."

/quote

then the algorithm is described. Which means that what the server will parse will not be necessary what the server have generated. 

Section 5.4 says how the Cookie Header should be sent to the server with an algorithm for what will receive the server.

John, do you think there is a missing algorithm for parsing the value of cookie header when sent by the client?
msg125427 - (view) Author: John J Lee (jjlee) Date: 2011-01-05 14:42
Again, I don't think this is relevant, because the bug is about servers parsing Cookie: headers.  Note that that string (the value of the Cookie: header) may be generated by a different server than the server that parses it (see the trac example mentioned in the bug comments).
msg125438 - (view) Author: karl (karlcow) Date: 2011-01-05 17:08
agreed. :)

Then my question about parsing rules for libraries. Is interoperability a plus here.
msg125439 - (view) Author: John J Lee (jjlee) Date: 2011-01-05 17:49
Yes, interoperability is good.  Do you have a specific concern about the change that I proposed?

If not, and you're instead just trying to ensure conformance, by all means read the draft specification that you pointed out and look for reasons why my suggested change would be the wrong thing to do -- that would certainly be useful.  Otherwise, it's hard to respond to non-specific "are you doing the right thing" questions with anything other than "yes" ;-)
msg125440 - (view) Author: John J Lee (jjlee) Date: 2011-01-05 17:51
Yes, interoperability is good.  Do you have a specific concern about the change that I proposed?

If not, and you're instead just trying to ensure conformance, by all means read the draft specification that you pointed out and look for reasons why my suggested change would be the wrong thing to do -- that would certainly be useful.  Otherwise, it's hard to respond to non-specific "are you doing the right thing" questions with anything other than "yes" ;-)
msg127366 - (view) Author: Carsten Klein (carsten.klein) Date: 2011-01-29 01:37
Personally I believe that this is WONTFIX.

Why?

Because, the original RFC states that the colon is part of the unwanted characters, regardless of whether Perl or other similar implementations ignore the standard.

Besides that, and most important: The cookies are always set by the server or application thereof. Therefore, the server must behave just like what is stated in the original RFC. And it must also expect existing browsers to behave just like what is requested in the RFC.

IMO, the original requester and supporters, both here and over on the trac issue tracker are simply not able to figure out a proper cookie tracking mechanism for marketing or whatever purposes. Or, perhaps, they want to exploit some unwanted behaviour in existing user agents, who knows.

Besides that, if the original poster and follow up requesters supporting the issue persist on using a non standard implementation of the cookie library, hey, who cares. Let them figure out how to patch or rewrite the existing library, and how to include it with their favourite server/user agent exploiting implementation.

And the same is true for the request on the trac issue tracker. Since the cookies are set by the server, there is no need to actually weaken the existing pseudo standard by incorporating ways to hierarchically define a cookie name other than what is already present in the scheme or could be accomplished using different characters other than those blacklisted.
msg127368 - (view) Author: Carsten Klein (carsten.klein) Date: 2011-01-29 01:47
One more: if you look closer at the accepted patch by CMLENZ over @ t.e.o., you will find:


         if self.req.headers_in.has_key('Cookie'):
-            self.incookie.load(self.req.headers_in['Cookie'])
+            #self.incookie.load(self.req.headers_in['Cookie'])
+            cookie = self.req.headers_in['Cookie']
+            old_set = self.incookie._BaseCookie__set
+            bad_cookies = []
+            def safe_set(key, real_value, coded_value):
+                try:
+                    old_set(key, real_value, coded_value)
+                except CookieError:
+                    bad_cookies.append(key)
+                    dict.__setitem__(self.incookie, key, None)
+            # override Cookie.set to ignore cookies with parse errors
+            self.incookie._BaseCookie__set = safe_set
+           # load the cookie values
+            self.incookie.load(cookie)
+            # clean up the Cookie.set overriding
+            self.incookie._BaseCookie__set = old_set
+            for key in bad_cookies:
+                del self.incookie[key]
+


which will eventually delete all cookies that do not match the original production rule.

Besides that, the original poster of the issue forgot to properly limit the cookies set by the other site to just a single host path, so these invalid cookies got routed to the trac instance running on some different host.
msg127441 - (view) Author: And Clover (aclover) Date: 2011-01-29 16:53
@carsten.klein: there is no such thing as an “original RFC”. The RFCs that have been produced on the subject of cookies, 2109 and 2965, were drawn up long after user-agents implemented cookies. Their attempts to clean up the warts of cookies and implement new features have completely failed. Their strictures are not obeyed by user agents; they are irrelevant and should not be used as the basis for any server-side cookie implementation.

The nearest to an original standard for cookies is the Netscape cookie-spec (see eg http://curl.haxx.se/rfc/cookie_spec.html), which is far too woolly to really count as a real specification, but which allows all but `;`, `,` and space, and in practice browsers do typically allow all characters that do not already serve as delimiters. Python should be liberal in what it accepts.
msg127465 - (view) Author: karl (karlcow) Date: 2011-01-29 19:10
@aclover
see my comment http://bugs.python.org/issue2193#msg125423
Adam Barth is working for Google on Chrome. 
The RFC being written is made in cooperation with other browser developers.

If you have comments about this RFC you are welcome to add comment on freenode at #whatwg.
msg127557 - (view) Author: John J Lee (jjlee) Date: 2011-01-30 22:07
I agree with And Clover that Carsten Klein's comments in #msg127366 are not correct, for the reason that And stated.

Also, Carsten repeats again the idea that the trac issue is about the trac server failing to generate appropriate cookies -- but that issue was in fact about trac's processing of cookies generated by servers other than trac.  In that particular case, the other server was probably under the control of the same group of people, but that isn't true in general.  Please read all the comments before adding more comments.

Re #msg127368: I don't think the patch applied to trac is relevant here.  "Somebody" should write a patch for module Cookie, and then it can be reviewed.
msg127832 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2011-02-04 00:05
if you'd take a close look at the following lines accepted as part of the patch for stripping out unwanted/non standard cookies over trac:

+                try:
+                    old_set(key, real_value, coded_value)
+                except CookieError:
+                    bad_cookies.append(key)

then you will find that trac actually behaves just like what is requested in the "original" RFC, namely that the colon is part of the reserved or special characters not meant for inclusion with the cookie name or, as it was stated in the referred rfc, the token.

Please do not fix.
msg127837 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2011-02-04 00:24
Besides that, BM is wrong in the assumption that *who ever he is* Davi M. Kristol states that the colon is a valid character.
There is no such notion in the article. In fact, DMK repeats the definition found in the original RFC on cookies, which also was referred to in the follow up RFC and then again referred to in the current RFC which seeks to get rid of the set-cookie2 directive, combining the two RFCs into a single RFC/pseudo standard.
msg132973 - (view) Author: BM (BM) Date: 2011-04-04 21:30
To Carsten Klein:
It would be great if you turn your eyes on and try to read more carefully before posting something here.

----------------------------
NAME=VALUE 
NAME is the cookie’s name, and VALUE is its value. Thus the header Set-Cookie: id=waldo sets a cookie with name id and value waldo. Both the cookie NAME  and its VALUE may be any sequence of characters except semi-colon, comma, or whitespace. 
----------------------------

In the above it says "any sequence of characters" EXCEPT a three characters: 
1. semi-colon
2. comma
3. whitespace

In English this means that "any sequence of characters" INCLUDES a colon and thus colon IS a valid character. BTW, this stupid bug is three years old, while the rest of the world implemented it right (Java, Ruby etc).

Also Python implementation of this part is at least... strange (being polite here). Because instead of excluding illegal chars, they actually going opposite by including the entire world and then going mad in the whole code inside... :-(
msg133057 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2011-04-05 18:02
Guess you are right...

I did overlook the quoted-string reference in the RFC:

   av-pair         =       attr ["=" value]        ; optional value
   attr            =       token
   value           =       word
   word            =       token | quoted-string

The actual production rules for quoted-string are not specified, so I guess that anything resembling unicode data would be allowed in that string provided that:

[...] from RFC 2109
10.1.3  Punctuation

   In Netscape's original proposal, the values in attribute-value pairs
   did not accept "-quoted strings.  Origin servers should be cautious
   about sending values that require quotes unless they know the
   receiving user agent understands them (i.e., "new" cookies).  A
   ("new") user agent should only use quotes around values in Cookie
   headers when the cookie's version(s) is (are) all compliant with this
   specification or later.

in RFC 2965 there is no such notice...

However, and this is important, the cookie values not matching token must be quoted by the origin server upon setting and the client must return these as quoted strings as well.
msg133059 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2011-04-05 18:08
Ups forgot to also mention the production rule for token, which is defined in the HTTP RFC RFC2616:

       token          = 1*<any CHAR except CTLs or separators>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT

and here it clearly states that a value that is not a quoted string
must not contain colons, and other characters as is specified by separators.
msg133061 - (view) Author: Carsten Klein (carsten.klein@axn-software.de) Date: 2011-04-05 18:10
Perhaps the best solution would be for the Python cookie module to
gracefully adapt to servers not quoting cookie values as is required
by the RFCs and make these quoted-strings instead?
msg139417 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 14:02
First, I agree with others who say that RFCs are basically irrelevant for cookies. For Django we've discovered this in various ways e.g. issue 9824 - http://bugs.python.org/issue9824 - which has now been applied. We have also had to work around the stdlib behaviour here.

Second, I have implemented a patch for this, with tests, against trunk - please review.

After looking at the implementation, this seems like the best way to make Python conservative in what is produces and liberal in what it accepts, which seems to be what the thread converged on. BaseCookie will now silently discard cookie 'morsels' with a colon in their name (and all other irregularities) when loading from a string, rather than raise an exception.

This allows cookie parsing to continue, so that other cookies in the HTTP header will be found.

However, if in Python code you attempt to directly set a morsel with an illegal name, you will still get the error.

There is a more lax strategy: Simply add ':' to the _LegalChars variable.

This would allow morsels to be *read* that have a colon in their name. However, from the current implementation, it would be very hard to add that ability without also allowing the BaseCookie class to produce such cookies. This would also raise other issues about at what point an error should be raised for setting invalid cookies etc.

Also, allowing these illegal cookies to be read is a corner case that is much less important - it isn't needed either for Trac or for our needs in Django.

For these reasons, I decided against the more lax strategy.
msg139418 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 14:03
Same patch backported to python 2.7 branch
msg139421 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 14:31
Found a bug with patch - this supersedes old one.
msg139422 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 14:31
Same against Python 2.7
msg139424 - (view) Author: karl (karlcow) Date: 2011-06-29 15:07
@Luke

did you have the opportunity to look at http://greenbytes.de/tech/webdav/rfc6265.html 

If there is something which doesn't match reality in that document that would be cool to have feedback about it.
msg139430 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-29 15:32
Thanks for taking a crack at this.  IMO the thing that needs to be fixed here is that receiving an invalid cookie makes it difficult to receive the valid cookies.  I'd love to accept your patch, but "silently ignore" sounds like a bad idea and is something we try to avoid in Python where practical.  In the email package we have a mechanism for reporting RFC violations (the defects attribute).  And we make as much sense of the input as we can, even if it is invalid, and preserve it.  We even allow generation of some RFC-invalid stuff, though we have plans to make applications work harder to do that in the future.

So I think your harder fix (accept the RFC-invalid cookies as long as there is some sensible way to parse them, but reject creating them) is the right approach.  I could also see the possibility of accepting a feature request for adding the ability to explicitly create RFC-invalid cookies, if someone can demonstrate a use case for doing so.  Other possible follow-on feature requests would be a 'defects' facility and/or a way to explicitly request that non-compliant cookies be ignored.

Note that in suggesting we reject creating RFC-invalid cookies by default I am not expressing an opinion on the relevance of the RFCs to cookie processing "in the wild".  Given that there is a standard and we are talking about what to do on generation, the obvious answer (by Postel's law) is that we ought to generate standards-compliant cookies.

Also note that I have not looked at the referenced rfc (here is what I believe is the official link: http://tools.ietf.org/html/rfc6265).
msg139432 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 15:33
I had a quick look, and there are these relevant bits:

<< There are two audiences for this specification: developers of cookie-generating servers and developers of cookie-consuming user agents. >>

And:

<< To maximize interoperability with user agents, servers should limit themselves to the well-behaved profile defined in Section 4 when generating cookies. >>

So, the document doesn't tell servers how to parse cookies, only how to generate them.

With regards to generation, there is basically no change - we still disallow programmers to set cookie names that are not a 'token', as defined by section 4 of that document, which is the same as RFC 2109 in terms of valid cookie names if you look at it. It is not obvious to me that Python's BaseCookie implementation obeys RFC 2109 (due to the way character lists are defined in the opposite way), but if you believe the comments in the module then it does.

I haven't read the rest of RFC 6265 and checked BaseCookie against it - that would be a much bigger job. But with respect to the change in my patch, it looks like we are all OK.
msg139452 - (view) Author: Luke Plant (spookylukey) Date: 2011-06-29 21:58
@ David Murray:

Thanks for taking the time to look at this - can I trouble you to keep going and read my response?

Thanks. 

You wrote:

> IMO the thing that needs to be fixed here is that receiving an invalid cookie 
> makes it difficult to receive the valid cookies.

I agree absolutely, and my patch implements exactly that aim. So I don't understand why the rest of your reply goes on to assume a very different goal - handling RFC-invalid cookies such that we can read their values. 

> I'd love to accept your patch, but "silently ignore" sounds like a bad 
> idea and is something we try to avoid in Python where practical.

"silently ignore" is what the current BaseCookie implementation does for **every other** type of invalid input, with the only exception (I can find) being the case where everything is correct apart from the name:

>>> from Cookie import SimpleCookie
>>> c = SimpleCookie()
>>> c.load('rubbish')
>>> c
<SimpleCookie: >
>>> c.output()
''
>>> c.load('more:rubbish')
>>> c
<SimpleCookie: >
>>> c.load('name=value')
>>> c
<SimpleCookie: name='value'>
>>> c.load('name=value; invalidattribute;')
>>> c.output()
'Set-Cookie: name=value'
>>> c.load('xyz123"sfs;;=-abc')
>>> c
<SimpleCookie: name='value'>
>>> c.load('namewith:colon=value')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/Cookie.py", line 632, in load
    self.__ParseString(rawdata)
  File "/usr/lib/python2.7/Cookie.py", line 665, in __ParseString
    self.__set(K, rval, cval)
  File "/usr/lib/python2.7/Cookie.py", line 585, in __set
    M.set(key, real_value, coded_value)
  File "/usr/lib/python2.7/Cookie.py", line 460, in set
    raise CookieError("Illegal key value: %s" % key)
Cookie.CookieError: Illegal key value: namewith:colon

As you said, I think this ticket is about fixing that last one, which is the gross exception to the rule, and the only thing that is causing major problems for people.

I agree that handling cookies with invalid names so that we can read them would be nice, but I think it is **very** difficult to find a rationale for saying that this bug fix should have to wait for that change.

In addition, I do not think there is any sensible way to implement that suggestion given the API of BaseCookie at the moment - it's not just the implementation I baulked at, it is the API of BaseCookie that works against you every step of the way.

The API of BaseCookie works like this in the typical case:

Consuming:

input -> load()        -> store in BaseCookie -> __getitem__()

Generating:

input -> __setitem__() -> store in BaseCookie -> output()

(Of course you don't have to do it that way, but looking at the docs and examples, http://docs.python.org/library/cookie.html, it's very clear that it's meant to be used that way).

The fact that both modes involves storing in BaseCookie really messes up any attempt to make these two work differently, and means you would have a really surprising and strange API whichever way you do it.

So, option (1): you could allow BaseCookie to store invalid cookie names if they come via load, but not via __setitem__(). This means that you've got an easy work-around for BaseCookie refusing to store your value - just use 'load()' instead. It also means that simple code like this fails:

>>> c['name:val'] = c['name:val']

which can be a problem if you are trying to do some generalised processing of the contents of a BaseCookie.

This leads us to option (2): allow RFC-invalid cookies to be stored, but then quietly sanitise (i.e. discard) in output().

But this becomes a much worse example of "silently ignoring" - the former is tolerant handling of data that is outside the programmers control - Postel's law. But this would be accepting incorrect data from a programmer, and then silently discarding it, is what we should do our utmost to avoid.

Neither of these options is any good, and it is because the API of BaseCookie was just not designed for it. The only sensible things to do, given our API, is sanitise on input. We could have an 'RFC-invalid' mode, but it would have to be turned on for the whole Cookie instance, and it would be a feature addition. An alternative implementation would be a 'badmorsels' attribute which preserves the rubbish where possible, in case you want it - but again, it's a feature addition.

Finally, I think the behaviour you are aiming at is unreasonable to ask for, especially in this ticket. You are essentially asking for a tolerant kind of cookie parsing which does its best to do 'do-what-I-mean'. But:

1) You haven't specified further than that (and there are no RFCs of use)
2) in general that kind of thing is notoriously hard to get right
3) the job is never finished - there are always more cases of invalid cookies that you *could* handle
4) and in fact it is impossible to provide an implementation that pleases everyone - there will always be invalid cookies that were 'meant' to be one thing, but the implementation interprets in a different way.

Do we really have to wait for all of that to be sorted out before we fix the glaring inconsistency in the way that BaseCookie.load() deals with some kinds of invalid data?

Thanks!

Luke
msg139482 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-30 14:52
Well, it's been a while since I looked at the cookie code, and I didn't review it when I reviewed your patch.  But some time after I hit submit on my message the issue about some stuff getting ignored anyway occurred to me.  So given that we are *already* in that situation, applying your patch to avoid bailing on only some invalid cookies makes sense to me.
msg144502 - (view) Author: Luke Plant (spookylukey) Date: 2011-09-24 16:01
David,

Thanks again for the time on this. Can I push to get the patches included, or is there work that still needs to be done on the patches now that the idea is accepted in principle? I did experiment with a few approaches to implement, and it seemed like the most sensible.

Thanks,

Luke
msg144509 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-09-24 22:25
I'm flat out right now on other projects.  But if no one else gets around to doing a final review and commit I should be able to get to it by the end of October.  If I don't, please ping me by posting here again.
msg157780 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-08 07:47
Ping.
msg158468 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-16 15:29
I tested setting cookies with ":" in the cookie name in both firefox and google-chrome. They both seem to allow and store the cookie with ":" in them.
Firefox sent a request header like this:

Set-Cookie	test:value=solution:is:he

the cookie with name containing ; or , failed.

The patched attached tries to silence the error with SimpleCookie. I am not sure how far it is going to help. I believe, we could just allow it from 3.3 onwards or create a new BrowserCookie class which is more lenient than SimpleCookie.

As far I see, the allowing ":" in the Legalchars seem to have met with negative votes because it was not in RFC, but well it seems some kind of de-facto acceptable behaviour with browsers, so having from a new version may not cause any harm. +1 to that. I can modify the tests from the attached patch.
msg158942 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-22 01:20
New changeset 57ec2e6cd70a by Senthil Kumaran in branch 'default':
Fix Issue2193 - Allow ":" character in Cookie NAME values
http://hg.python.org/cpython/rev/57ec2e6cd70a
msg158943 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-22 01:28
I tested with apache to set ":" in names and then verified the behavior in browsers and it looks like it fine to allow ":" as legal character in cookie name ( though RFC originally does say that). My guess is previously it could have been thought that ":" might hinder parsing, but does not seem so as how Cookie name=value;name2=values2 have evolved.

So in 3.3, I have made the change to just allow ":" in Cookie Name. But as previous versions raised CookieError error, I see this is a change in behavior and it should not be back-ported.

In Docs 2.7,3.2 etc, I see the mention that users should look for and catch CookieError if they are capturing cookies from unknown sources, in case if it contains any illegal characters. That still applies to other illegal characters. I think, the docs could just be updated with mention of illegal characters in python versions to be little more helpful and with that, this issue can be closed.
msg158945 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-22 02:32
New changeset d3b0845a9253 by Senthil Kumaran in branch '3.2':
issue2193 - Update 3.2 docs about legal characters allowed in Cookie name
http://hg.python.org/cpython/rev/d3b0845a9253

New changeset 8cae3ee7f691 by Senthil Kumaran in branch 'default':
issue2193 - Update docs about the legal characters allowed in Cookie name
http://hg.python.org/cpython/rev/8cae3ee7f691
msg176899 - (view) Author: Eyal Lewisohn (Eyal.Lewisohn) Date: 2012-12-04 11:55
just wanted to note that I agree with BM's comment and that I had to change LegalChars myself to include the slash '/' in order for my application to work properly.
History
Date User Action Args
2012-12-04 11:55:46Eyal.Lewisohnsetnosy: + Eyal.Lewisohn
messages: + msg176899
2012-04-22 02:39:29orsenthilsetstatus: open -> closed
resolution: fixed
stage: test needed -> resolved
2012-04-22 02:32:15python-devsetmessages: + msg158945
2012-04-22 01:28:56orsenthilsetmessages: + msg158943
2012-04-22 01:20:02python-devsetnosy: + python-dev
messages: + msg158942
2012-04-16 15:29:15orsenthilsetnosy: + orsenthil
messages: + msg158468
2012-04-08 07:47:50serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg157780
2011-09-24 22:25:52r.david.murraysetmessages: + msg144509
2011-09-24 16:01:46spookylukeysetmessages: + msg144502
2011-06-30 14:52:23r.david.murraysetmessages: + msg139482
2011-06-29 21:58:20spookylukeysetmessages: + msg139452
2011-06-29 15:33:59spookylukeysetmessages: + msg139432
2011-06-29 15:32:54r.david.murraysetmessages: + msg139430
2011-06-29 15:07:58karlcowsetmessages: + msg139424
2011-06-29 14:32:51spookylukeysetfiles: - issue2193_patch_python27.diff
2011-06-29 14:32:46spookylukeysetfiles: - issue2193_patch_trunk.diff
2011-06-29 14:31:29spookylukeysetfiles: + issue2193_patch_2_python27.diff

messages: + msg139422
2011-06-29 14:31:09spookylukeysetfiles: + issue2193_patch_2_trunk.diff

messages: + msg139421
2011-06-29 14:03:29spookylukeysetfiles: + issue2193_patch_python27.diff

messages: + msg139418
2011-06-29 14:02:34spookylukeysetfiles: + issue2193_patch_trunk.diff

nosy: + spookylukey
messages: + msg139417

keywords: + patch
2011-06-12 18:35:27terry.reedysetversions: + Python 3.3, - Python 3.1
2011-04-05 18:23:42carsten.kleinsetnosy: + carsten.klein, - carsten.klein@axn-software.de
2011-04-05 18:19:05ezio.melottisetnosy: - carsten.klein
2011-04-05 18:10:53carsten.klein@axn-software.desetmessages: + msg133061
2011-04-05 18:08:07carsten.klein@axn-software.desetmessages: + msg133059
2011-04-05 18:02:14carsten.klein@axn-software.desetmessages: + msg133057
2011-04-04 21:30:04BMsetmessages: + msg132973
2011-02-04 00:24:07carsten.klein@axn-software.desetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy, carsten.klein@axn-software.de, carsten.klein
messages: + msg127837
2011-02-04 00:05:41carsten.klein@axn-software.desetnosy: + carsten.klein@axn-software.de
messages: + msg127832
2011-01-30 22:07:57jjleesetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy, carsten.klein
messages: + msg127557
2011-01-29 19:10:37karlcowsetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy, carsten.klein
messages: + msg127465
2011-01-29 16:53:47acloversetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy, carsten.klein
messages: + msg127441
2011-01-29 01:47:51carsten.kleinsetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy, carsten.klein
messages: + msg127368
2011-01-29 01:37:03carsten.kleinsetnosy: + carsten.klein
messages: + msg127366
2011-01-05 17:51:25jjleesetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125440
2011-01-05 17:49:52jjleesetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125439
2011-01-05 17:08:47karlcowsetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125438
2011-01-05 14:42:41jjleesetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125427
2011-01-05 13:23:36karlcowsetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125423
2011-01-05 12:14:17jjleesetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125417
2011-01-05 06:08:14karlcowsetnosy: tim.peters, akuchling, georg.brandl, jjlee, dstanek, jerry.seutter, BM, aclover, r.david.murray, karlcow, BreamoreBoy
messages: + msg125413
2011-01-05 06:06:27karlcowsetnosy: + karlcow
messages: + msg125412
2010-12-17 03:33:49r.david.murraysetnosy: + r.david.murray
messages: + msg124178
2010-10-21 15:55:58acloversetnosy: + aclover
messages: + msg119312
2010-10-19 21:12:25jjleesetmessages: + msg119167
2010-10-19 02:13:20dstaneksetnosy: + dstanek
messages: + msg119104
2010-09-23 06:45:50georg.brandlsetpriority: normal -> critical

nosy: + georg.brandl
messages: + msg117171

assignee: akuchling -> georg.brandl
2010-09-21 22:32:11jjleesetmessages: + msg117117
2010-09-20 21:29:49BreamoreBoysetnosy: + BreamoreBoy

messages: + msg116979
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-02-13 01:22:16ajaksu2setnosy: + jjlee
stage: test needed
versions: - Python 2.5, Python 2.4
2008-03-20 04:18:21jafosetpriority: normal
assignee: akuchling
nosy: + akuchling
2008-02-28 02:10:28BMsetnosy: + tim.peters
messages: + msg63086
2008-02-27 19:22:08jerry.seuttersettype: behavior
2008-02-27 19:20:21jerry.seuttersetmessages: + msg63075
2008-02-26 12:18:44BMsetmessages: + msg63038
2008-02-26 09:46:35jerry.seuttersetnosy: + jerry.seutter
messages: + msg63037
2008-02-26 02:27:22BMcreate