classification
Title: cookies module allows commas in keys
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Kunal Grover, anish.shah, demian.brecht, jaraco, martin.panter, python-dev, serhiy.storchaka
Priority: normal Keywords: 3.5regression, easy, patch

Created on 2016-02-06 05:18 by jaraco, last changed 2016-02-24 13:54 by jaraco. This issue is now closed.

Files
File name Uploaded Description Edit
issue26302_20160206.patch anish.shah, 2016-02-06 17:33 review
issue26302_20160206-2.patch anish.shah, 2016-02-06 18:09 review
issue26302_20160207.patch anish.shah, 2016-02-07 10:36 review
Messages (20)
msg259718 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-02-06 05:18
Commas aren't legal characters in cookie keys, yet in Python 3.5, they're allowed:

>>> bool(http.cookies._is_legal_key(','))
True

The issue lies in the use of _LegalChars constructing a regular expression.

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

The issue arises in this line:

_is_legal_key = re.compile('[%s]+' % _LegalChars).fullmatch

Which was added in 88e1151e8e0242 referencing issue2211.

The problem is that in a regular expression, and in a character class in particular, the '-' character has a special meaning if not the first character in the class, which is "span all characters between the leading and following characters". As a result, the pattern has the unintended effect of including the comma in the pattern:

>>> http.cookies._is_legal_key.__self__
re.compile("[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-.^_`|~:]+")
>>> pattern = _
>>> pattern.fullmatch(',')
<_sre.SRE_Match object; span=(0, 1), match=','>
>>> ord('+')
43
>>> ord('.')
46
>>> ''.join(map(chr, range(43,47)))
'+,-.'

That's how the comma creeped in.

This issue is the underlying cause of https://bitbucket.org/cherrypy/cherrypy/issues/1405/testcookies-fails-on-python-35 and possible other cookie-related bugs in Python.

While I jest about regular expressions, I like the implementation. It just needs to account for the extraneous comma, perhaps by escaping the dash:

_is_legal_key = re.compile('[%s]+' % _LegalChars.replace('-', '\\-')).fullmatch

Also, regression tests for keys containing invalid characters should be added as well.
msg259721 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-06 08:35
Ah, how can I missed this catch?

The simplest way is just move "-" to the start or the end of character list. The most error-proof way is to use re.escape().
msg259726 - (view) Author: Kunal Grover (Kunal Grover) Date: 2016-02-06 12:19
Hi, I am a newcomer here, I am interested to work on this issue.
msg259742 - (view) Author: Anish Shah (anish.shah) * Date: 2016-02-06 17:33
We just need to use '\-' instead of '-'.

```
>>> regex = re.compile("[a-z]")
>>> bool(regex.match('b'))
True
>>> regex = re.compile("[a\-z]")
>>> bool(regex.match('b'))
False
```

I have uploaded a patch.
Let me know if this needs some tests too?
msg259744 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-02-06 17:45
A test would be much appreciated. It's worthwhile capturing the rejection of at least one invalid character, if not several common ones.
msg259745 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-06 17:54
No, _LegalChars shouldn't include "\".
msg259746 - (view) Author: Anish Shah (anish.shah) * Date: 2016-02-06 18:09
@serhiy.storchaka OK, I have used re.escape instead of '\'. And I have added a test too.

Also, may I know why '\' can not be in _LegalChars, so that I can remember this for future purpose?
msg259750 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-06 20:24
_LegalChars contained only characters which don't require quoting, as documented in the comment above. If _LegalChars was only used to create _is_legal_key, we would just wrote the regular expression. But it is used also in other places. In this particular case adding "\" to _LegalChars doesn't lead to visible bug (except inconsistency with the comment), but we can't be sure.

_is_legal_key() is implementation detail. It would be better to test public API.
msg259762 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-07 05:12
Another option might be to do away with the regular expression (personally I like to avoid REs and code generation where practical):

def _is_legal_key(key):
    return key and set(_LegalChars).issuperset(key)
msg259765 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-07 06:28
The regular expression is used for performance.
msg259778 - (view) Author: Anish Shah (anish.shah) * Date: 2016-02-07 10:36
I ran regex and issuperset version on a random string. The regex one gives better performance. So, I have included the re.escape in the patch. 

>>> random_str = ''.join(random.choice(_LegalChars) for _ in range(10 ** 8))
>>> is_legal_key = re.compile('[%s]+' % re.escape(_LegalChars)).fullmatch
>>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
0.3168252399998437
>>> def is_legal_key(key):
...     return key and set(_LegalChars).issuperset(key)
... 
>>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
4.3335622880001665


Also, I have updated the patch. Can you please review it? :)
msg259780 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-07 11:00
The same bug is in the _CookiePattern regular expression. For illegal characters other than a comma, the CookieError does not actually seem to be raised.
msg259781 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-07 11:03
I take that back about _CookiePattern having the same bug; it uses a different input variable. But it is strange that _LegalKeyChars lists a comma, but _LegalChars omits it.
msg259782 - (view) Author: Anish Shah (anish.shah) * Date: 2016-02-07 11:20
_LegalKeyChars contains "\-" whereas _LegalChars just contains "-".

On Sun, Feb 7, 2016 at 4:33 PM, Martin Panter <report@bugs.python.org>
wrote:

>
> Martin Panter added the comment:
>
> I take that back about _CookiePattern having the same bug; it uses a
> different input variable. But it is strange that _LegalKeyChars lists a
> comma, but _LegalChars omits it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue26302>
> _______________________________________
>
msg259817 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-08 05:15
The patch looks okay to me.

The inconsistency between silently rejecting cookie “morsels” and raising an exception from load() also exists in 2.7, so maybe it is not a big deal.
msg260399 - (view) Author: Anish Shah (anish.shah) * Date: 2016-02-17 14:05
Is this patch ready to merge?
msg260443 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-02-18 09:09
Looks good to me.
msg260766 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 08:19
LGTM. Do you want to commit the patch Jason?
msg260801 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-02-24 13:51
New changeset 758cb13aaa2c by Anish Shah <anish.shah> in branch '3.5':
Issue #26302: Correctly identify comma as an invalid character for a cookie (correcting regression in Python 3.5).
https://hg.python.org/cpython/rev/758cb13aaa2c

New changeset 91eb7ae951a1 by Jason R. Coombs in branch 'default':
Issue #26302: merge from 3.5
https://hg.python.org/cpython/rev/91eb7ae951a1
msg260802 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2016-02-24 13:54
Thanks Anish for the patch, now slated for Python 3.5.2 and Python 3.6.
History
Date User Action Args
2016-02-24 13:54:01jaracosetstatus: open -> closed
resolution: fixed
messages: + msg260802
2016-02-24 13:51:43python-devsetnosy: + python-dev
messages: + msg260801
2016-02-24 08:19:11serhiy.storchakasetmessages: + msg260766
2016-02-18 09:09:22jaracosetmessages: + msg260443
2016-02-17 14:05:24anish.shahsetmessages: + msg260399
2016-02-08 05:15:11martin.pantersetmessages: + msg259817
2016-02-07 11:20:48anish.shahsetmessages: + msg259782
2016-02-07 11:03:25martin.pantersetmessages: + msg259781
2016-02-07 11:00:12martin.pantersetmessages: + msg259780
2016-02-07 10:36:05anish.shahsetfiles: + issue26302_20160207.patch

messages: + msg259778
2016-02-07 06:28:21serhiy.storchakasetmessages: + msg259765
2016-02-07 05:12:57martin.pantersetnosy: + martin.panter

messages: + msg259762
stage: needs patch -> patch review
2016-02-06 20:24:48serhiy.storchakasetmessages: + msg259750
2016-02-06 18:09:03anish.shahsetfiles: + issue26302_20160206-2.patch

messages: + msg259746
2016-02-06 17:54:52serhiy.storchakasetmessages: + msg259745
2016-02-06 17:45:52jaracosetmessages: + msg259744
2016-02-06 17:33:15anish.shahsetfiles: + issue26302_20160206.patch
keywords: + patch
messages: + msg259742
2016-02-06 16:37:57anish.shahsetnosy: + anish.shah
2016-02-06 12:19:54Kunal Groversetnosy: + Kunal Grover
messages: + msg259726
2016-02-06 08:36:07serhiy.storchakasetkeywords: + 3.5regression
2016-02-06 08:35:47serhiy.storchakasettype: behavior
components: + Library (Lib)

keywords: + easy, - 3.5regression
nosy: + demian.brecht
messages: + msg259721
stage: needs patch
2016-02-06 05:18:50jaracocreate