msg334338 - (view) |
Author: MeiK (MeiK) * |
Date: 2019-01-25 03:11 |
http.cookies.BaseCookie[1] can't parse Expires in this format like Expires=Thu,31 Jan 2019 05:56:00 GMT;(Less space after Thu,).
I encountered this problem in actual use, Chrome, IE and Firefox can parse this string normally. Many languages, such as JavaScript, can also parse this data automatically.
I built a test site using Flask: https://paste.ubuntu.com/p/K7Z4K4KH7Z/, Use curl and requests to get cookies correctly, but not with aiohttp (because it uses http.cookies.BaseCookie).
Looking at MDN[2] and rfc[3](Thanks tirkarthi), this doesn't seem to be a canonical behavior, But some Java WEB frameworks will produce this behavior (such as the one that caused me to find the problem).
This problem can be solved by modifying a regular expression[4], but I don't know if it should be compatible with this non-standard way of writing.
English is not my native language; please excuse typing errors.
[1] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L457
[2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives
[3] https://tools.ietf.org/html/rfc6265#section-4.1.1
[4] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L444
|
msg334339 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2019-01-25 03:29 |
Thanks for the MDN cookie directive link. I didn't know it links to Date link in the GitHub PR. I don't see space optional in the sane-date format specified for expires attribute. I could be reading the grammar wrong. I will wait for others thoughts on this.
|
msg334392 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2019-01-26 13:03 |
I presume MeiK wants to use BaseCookie to parse the Set-Cookie header field, as in
>>> BaseCookie('Hello=World; Expires=Thu, 31 Jan 2019 05:56:00 GMT;')
<BaseCookie: Hello='World'>
>>> BaseCookie('Hello=World; Expires=Thu,31 Jan 2019 05:56:00 GMT;')
<BaseCookie: >
Karthikeyan, if you meant the “sane-cookie-date” format (https://tools.ietf.org/html/rfc6265#page-9), that is just the IETF’s recommended date format. I suspect MeiK is trying to _parse_ the date rather than generate it, in which case the procedure in <https://tools.ietf.org/html/rfc6265#section-5.1.1> may be more relevant. Spaces and commas are both treated as delimiters, so the problematic Expires attribute should parse fine.
BTW, this special handling of Set-Cookie attributes like Expires is not documented, though it does seem intentional. According to the documentation they should be treated as new Morsels.
|
msg334393 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2019-01-26 13:43 |
Yes, sorry I thought it was the format used for parsing too. Thanks for the example Martin. I am linking @MeiK PR to the issue where I asked them to open an issue for this.
|
msg340683 - (view) |
Author: daniel hahler (blueyed) * |
Date: 2019-04-22 22:45 |
Another example of a value that fails to parse is if "-0000" is used instead of "GMT", which is the case with GitHub:
> Set-Cookie: has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000
So using a regular expression here to only parse the sane-cookie-date format (that is recommended for output) is wrong.
The last change to it was in 2012 only (https://github.com/python/cpython/commit/aeeba2629aa52e4e73e19a1502b3d3133ea68dec)
|
msg340684 - (view) |
Author: daniel hahler (blueyed) * |
Date: 2019-04-22 23:08 |
http.cookiejar parses this correctly, using http2time:
>>> import http.cookiejar
>>> http.cookiejar.parse_ns_headers(["has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000"])
[[('has_recent_activity', '1'), ('path', '/'), ('expires', 1555975638), ('version', '0')]]
Ref: https://github.com/python/cpython/blob/9f316bd9684d27b7e21fbf43ca86dc5e65dac4af/Lib/http/cookiejar.py#L204-L249
|
msg340688 - (view) |
Author: MeiK (MeiK) * |
Date: 2019-04-23 02:19 |
You are right, I saw the agreed way of parsing in RFC6265[1], it seems that you should not use regular expressions.
I used http.cookiejar to update the code, but it failed to pass the test: https://github.com/python/cpython/blob/master/Lib/test/test_http_cookies.py#L19. However, other languages and libraries (JavaScript, Requests, http.cookiejar, etc.) cannot parse it. It seems that the contents of the brackets should be escaped. Is this a wrong test case?
I updated the code[2] using http.cookiejar. Is this a good idea?
English is not my native language; please excuse typing errors.
[1] https://tools.ietf.org/html/rfc6265
[2] https://github.com/python/cpython/pull/11665/commits/a03bc75348a4041c7411da3175689c087a98789f
|
msg340689 - (view) |
Author: MeiK (MeiK) * |
Date: 2019-04-23 02:54 |
I found that using http.cookiejar.parse_ns_headers would cause some of the previous tests to fail, and if you think this method is workable, I can follow it to write a new one and pass all the tests.
|
msg340831 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2019-04-25 10:09 |
Test_http_cookies line 19 has the following test case:
{'data': 'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
'dict': {'keebler' : 'E=mc2; L="Loves"; fudge=\012;'},
'repr': '''<SimpleCookie: keebler='E=mc2; L="Loves"; fudge=\\n;'>''',
'output': 'Set-Cookie: keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"'}
This is similar to an example in the documentation:
>>> C.load('keebler="E=everybody; L=\\"Loves\\"; fudge=\\012;";')
>>> print(C)
Set-Cookie: keebler="E=everybody; L=\"Loves\"; fudge=\012;"
If you break parsing of this string in the “load” method, you break documented behaviour. The “http.cookie” module is documented to follow RFC 2109. I believe the strings are valid by RFC 2109, in which the value is allowed to use the HTTP “quoted-string” format.
|
msg341321 - (view) |
Author: daniel hahler (blueyed) * |
Date: 2019-05-03 02:01 |
I seems like http.cookiejar should be used for clients, which includes more relaxed parsing of cookies. This is mentioned in the docs at https://github.com/python/cpython/blame/443fe5a52a3d6a101795380227ced38b4b5e0a8b/Doc/library/http.cookies.rst#L63-L65.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:10 | admin | set | github: 80005 |
2019-05-03 02:01:30 | blueyed | set | messages:
+ msg341321 |
2019-04-25 10:09:58 | martin.panter | set | messages:
+ msg340831 |
2019-04-23 06:35:03 | SilentGhost | set | nosy:
+ martin.panter, xtreak
|
2019-04-23 02:54:05 | MeiK | set | nosy:
- martin.panter, xtreak messages:
+ msg340689
|
2019-04-23 02:19:22 | MeiK | set | messages:
+ msg340688 |
2019-04-22 23:08:19 | blueyed | set | messages:
+ msg340684 |
2019-04-22 22:45:31 | blueyed | set | nosy:
+ blueyed messages:
+ msg340683
|
2019-01-26 13:43:36 | xtreak | set | keywords:
+ patch
stage: patch review messages:
+ msg334393 pull_requests:
+ pull_request11517 |
2019-01-26 13:03:14 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg334392
|
2019-01-25 03:29:47 | xtreak | set | nosy:
+ xtreak
messages:
+ msg334339 versions:
+ Python 3.8 |
2019-01-25 03:11:05 | MeiK | set | messages:
+ msg334338 |
2019-01-25 03:08:24 | MeiK | create | |