classification
Title: http.cookies._CookiePattern modifying regular expressions
Type: enhancement Stage: patch review
Components: Extension Modules Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: MeiK, blueyed, martin.panter, xtreak
Priority: normal Keywords: patch

Created on 2019-01-25 03:08 by MeiK, last changed 2019-05-03 02:01 by blueyed.

Pull Requests
URL Status Linked Edit
PR 11665 open xtreak, 2019-01-26 13:43
Messages (10)
msg334338 - (view) Author: MeiK (MeiK) * (Python committer) Date: 2019-01-25 03:11
http.cookies.BaseCookie[1] can't parse Expires in this format like Expires=Thu,31 Jan 2019 05:56:00 GMT;(Less space after Thu,).

I encountered this problem in actual use, Chrome, IE and Firefox can parse this string normally. Many languages, such as JavaScript, can also parse this data automatically.

I built a test site using Flask: https://paste.ubuntu.com/p/K7Z4K4KH7Z/, Use curl and requests to get cookies correctly, but not with aiohttp (because it uses http.cookies.BaseCookie).

Looking at MDN[2] and rfc[3](Thanks tirkarthi), this doesn't seem to be a canonical behavior, But some Java WEB frameworks will produce this behavior (such as the one that caused me to find the problem).

This problem can be solved by modifying a regular expression[4], but I don't know if it should be compatible with this non-standard way of writing.

English is not my native language; please excuse typing errors.


[1] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L457
[2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives
[3] https://tools.ietf.org/html/rfc6265#section-4.1.1
[4] https://github.com/python/cpython/blob/master/Lib/http/cookies.py#L444
msg334339 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-01-25 03:29
Thanks for the MDN cookie directive link. I didn't know it links to Date link in the GitHub PR. I don't see space optional in the sane-date format specified for expires attribute. I could be reading the grammar wrong. I will wait for others thoughts on this.
msg334392 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-01-26 13:03
I presume MeiK wants to use BaseCookie to parse the Set-Cookie header field, as in

>>> BaseCookie('Hello=World; Expires=Thu, 31 Jan 2019 05:56:00 GMT;')
<BaseCookie: Hello='World'>
>>> BaseCookie('Hello=World; Expires=Thu,31 Jan 2019 05:56:00 GMT;')
<BaseCookie: >

Karthikeyan, if you meant the “sane-cookie-date” format (https://tools.ietf.org/html/rfc6265#page-9), that is just the IETF’s recommended date format. I suspect MeiK is trying to _parse_ the date rather than generate it, in which case the procedure in <https://tools.ietf.org/html/rfc6265#section-5.1.1> may be more relevant. Spaces and commas are both treated as delimiters, so the problematic Expires attribute should parse fine.

BTW, this special handling of Set-Cookie attributes like Expires is not documented, though it does seem intentional. According to the documentation they should be treated as new Morsels.
msg334393 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-01-26 13:43
Yes, sorry I thought it was the format used for parsing too. Thanks for the example Martin. I am linking @MeiK PR to the issue where I asked them to open an issue for this.
msg340683 - (view) Author: daniel hahler (blueyed) * Date: 2019-04-22 22:45
Another example of a value that fails to parse is if "-0000" is used instead of "GMT", which is the case with GitHub:

> Set-Cookie: has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000

So using a regular expression here to only parse the sane-cookie-date format (that is recommended for output) is wrong.

The last change to it was in 2012 only (https://github.com/python/cpython/commit/aeeba2629aa52e4e73e19a1502b3d3133ea68dec)
msg340684 - (view) Author: daniel hahler (blueyed) * Date: 2019-04-22 23:08
http.cookiejar parses this correctly, using http2time:

    >>> import http.cookiejar
    >>> http.cookiejar.parse_ns_headers(["has_recent_activity=1; path=/; expires=Mon, 22 Apr 2019 23:27:18 -0000"])
    [[('has_recent_activity', '1'), ('path', '/'), ('expires', 1555975638), ('version', '0')]]

Ref: https://github.com/python/cpython/blob/9f316bd9684d27b7e21fbf43ca86dc5e65dac4af/Lib/http/cookiejar.py#L204-L249
msg340688 - (view) Author: MeiK (MeiK) * (Python committer) Date: 2019-04-23 02:19
You are right, I saw the agreed way of parsing in RFC6265[1], it seems that you should not use regular expressions.

I used http.cookiejar to update the code, but it failed to pass the test: https://github.com/python/cpython/blob/master/Lib/test/test_http_cookies.py#L19. However, other languages and libraries (JavaScript, Requests, http.cookiejar, etc.) cannot parse it. It seems that the contents of the brackets should be escaped. Is this a wrong test case?

I updated the code[2] using http.cookiejar. Is this a good idea?

English is not my native language; please excuse typing errors.

[1] https://tools.ietf.org/html/rfc6265
[2] https://github.com/python/cpython/pull/11665/commits/a03bc75348a4041c7411da3175689c087a98789f
msg340689 - (view) Author: MeiK (MeiK) * (Python committer) Date: 2019-04-23 02:54
I found that using http.cookiejar.parse_ns_headers would cause some of the previous tests to fail, and if you think this method is workable, I can follow it to write a new one and pass all the tests.
msg340831 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-04-25 10:09
Test_http_cookies line 19 has the following test case:

{'data': 'keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"',
 'dict': {'keebler' : 'E=mc2; L="Loves"; fudge=\012;'},
 'repr': '''<SimpleCookie: keebler='E=mc2; L="Loves"; fudge=\\n;'>''',
 'output': 'Set-Cookie: keebler="E=mc2; L=\\"Loves\\"; fudge=\\012;"'}

This is similar to an example in the documentation:

>>> C.load('keebler="E=everybody; L=\\"Loves\\"; fudge=\\012;";')
>>> print(C)
Set-Cookie: keebler="E=everybody; L=\"Loves\"; fudge=\012;"

If you break parsing of this string in the “load” method, you break documented behaviour. The “http.cookie” module is documented to follow RFC 2109. I believe the strings are valid by RFC 2109, in which the value is allowed to use the HTTP “quoted-string” format.
msg341321 - (view) Author: daniel hahler (blueyed) * Date: 2019-05-03 02:01
I seems like http.cookiejar should be used for clients, which includes more relaxed parsing of cookies.  This is mentioned in the docs at https://github.com/python/cpython/blame/443fe5a52a3d6a101795380227ced38b4b5e0a8b/Doc/library/http.cookies.rst#L63-L65.
History
Date User Action Args
2019-05-03 02:01:30blueyedsetmessages: + msg341321
2019-04-25 10:09:58martin.pantersetmessages: + msg340831
2019-04-23 06:35:03SilentGhostsetnosy: + martin.panter, xtreak
2019-04-23 02:54:05MeiKsetnosy: - martin.panter, xtreak
messages: + msg340689
2019-04-23 02:19:22MeiKsetmessages: + msg340688
2019-04-22 23:08:19blueyedsetmessages: + msg340684
2019-04-22 22:45:31blueyedsetnosy: + blueyed
messages: + msg340683
2019-01-26 13:43:36xtreaksetkeywords: + patch

stage: patch review
messages: + msg334393
pull_requests: + pull_request11517
2019-01-26 13:03:14martin.pantersetnosy: + martin.panter
messages: + msg334392
2019-01-25 03:29:47xtreaksetnosy: + xtreak

messages: + msg334339
versions: + Python 3.8
2019-01-25 03:11:05MeiKsetmessages: + msg334338
2019-01-25 03:08:24MeiKcreate