Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cookies module allows commas in keys #70490

Closed
jaraco opened this issue Feb 6, 2016 · 20 comments
Closed

cookies module allows commas in keys #70490

jaraco opened this issue Feb 6, 2016 · 20 comments
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@jaraco
Copy link
Member

jaraco commented Feb 6, 2016

BPO 26302
Nosy @jaraco, @vadmium, @serhiy-storchaka, @demianbrecht, @AnishShah
Files
  • issue26302_20160206.patch
  • issue26302_20160206-2.patch
  • issue26302_20160207.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-02-24.13:54:01.325>
    created_at = <Date 2016-02-06.05:18:50.253>
    labels = ['easy', 'type-bug', 'library']
    title = 'cookies module allows commas in keys'
    updated_at = <Date 2016-02-24.13:54:01.323>
    user = 'https://github.com/jaraco'

    bugs.python.org fields:

    activity = <Date 2016-02-24.13:54:01.323>
    actor = 'jaraco'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-02-24.13:54:01.325>
    closer = 'jaraco'
    components = ['Library (Lib)']
    creation = <Date 2016-02-06.05:18:50.253>
    creator = 'jaraco'
    dependencies = []
    files = ['41836', '41837', '41841']
    hgrepos = []
    issue_num = 26302
    keywords = ['patch', 'easy', '3.5regression']
    message_count = 20.0
    messages = ['259718', '259721', '259726', '259742', '259744', '259745', '259746', '259750', '259762', '259765', '259778', '259780', '259781', '259782', '259817', '260399', '260443', '260766', '260801', '260802']
    nosy_count = 7.0
    nosy_names = ['jaraco', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'demian.brecht', 'anish.shah', 'Kunal Grover']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue26302'
    versions = ['Python 3.5', 'Python 3.6']

    @jaraco
    Copy link
    Member Author

    jaraco commented Feb 6, 2016

    Commas aren't legal characters in cookie keys, yet in Python 3.5, they're allowed:

    >>> bool(http.cookies._is_legal_key(','))
    True

    The issue lies in the use of _LegalChars constructing a regular expression.

    "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

    The issue arises in this line:

    _is_legal_key = re.compile('[%s]+' % _LegalChars).fullmatch

    Which was added in 88e1151e8e0242 referencing bpo-2211.

    The problem is that in a regular expression, and in a character class in particular, the '-' character has a special meaning if not the first character in the class, which is "span all characters between the leading and following characters". As a result, the pattern has the unintended effect of including the comma in the pattern:

    >>> http.cookies._is_legal_key.__self__
    re.compile("[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-.^_`|~:]+")
    >>> pattern = _
    >>> pattern.fullmatch(',')
    <_sre.SRE_Match object; span=(0, 1), match=','>
    >>> ord('+')
    43
    >>> ord('.')
    46
    >>> ''.join(map(chr, range(43,47)))
    '+,-.'

    That's how the comma creeped in.

    This issue is the underlying cause of https://bitbucket.org/cherrypy/cherrypy/issues/1405/testcookies-fails-on-python-35 and possible other cookie-related bugs in Python.

    While I jest about regular expressions, I like the implementation. It just needs to account for the extraneous comma, perhaps by escaping the dash:

    _is_legal_key = re.compile('[%s]+' % _LegalChars.replace('-', '\\-')).fullmatch

    Also, regression tests for keys containing invalid characters should be added as well.

    @serhiy-storchaka
    Copy link
    Member

    Ah, how can I missed this catch?

    The simplest way is just move "-" to the start or the end of character list. The most error-proof way is to use re.escape().

    @serhiy-storchaka serhiy-storchaka added easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 6, 2016
    @KunalGrover
    Copy link
    Mannequin

    KunalGrover mannequin commented Feb 6, 2016

    Hi, I am a newcomer here, I am interested to work on this issue.

    @AnishShah
    Copy link
    Mannequin

    AnishShah mannequin commented Feb 6, 2016

    We just need to use '\-' instead of '-'.

    >>> regex = re.compile("[a-z]")
    >>> bool(regex.match('b'))
    True
    >>> regex = re.compile("[a\-z]")
    >>> bool(regex.match('b'))
    False
    

    I have uploaded a patch.
    Let me know if this needs some tests too?

    @jaraco
    Copy link
    Member Author

    jaraco commented Feb 6, 2016

    A test would be much appreciated. It's worthwhile capturing the rejection of at least one invalid character, if not several common ones.

    @serhiy-storchaka
    Copy link
    Member

    No, _LegalChars shouldn't include "\".

    @AnishShah
    Copy link
    Mannequin

    AnishShah mannequin commented Feb 6, 2016

    @serhiy.storchaka OK, I have used re.escape instead of '\'. And I have added a test too.

    Also, may I know why '\' can not be in _LegalChars, so that I can remember this for future purpose?

    @serhiy-storchaka
    Copy link
    Member

    _LegalChars contained only characters which don't require quoting, as documented in the comment above. If _LegalChars was only used to create _is_legal_key, we would just wrote the regular expression. But it is used also in other places. In this particular case adding "\" to _LegalChars doesn't lead to visible bug (except inconsistency with the comment), but we can't be sure.

    _is_legal_key() is implementation detail. It would be better to test public API.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 7, 2016

    Another option might be to do away with the regular expression (personally I like to avoid REs and code generation where practical):

    def _is_legal_key(key):
        return key and set(_LegalChars).issuperset(key)

    @serhiy-storchaka
    Copy link
    Member

    The regular expression is used for performance.

    @AnishShah
    Copy link
    Mannequin

    AnishShah mannequin commented Feb 7, 2016

    I ran regex and issuperset version on a random string. The regex one gives better performance. So, I have included the re.escape in the patch.

    >>> random_str = ''.join(random.choice(_LegalChars) for _ in range(10 ** 8))
    >>> is_legal_key = re.compile('[%s]+' % re.escape(_LegalChars)).fullmatch
    >>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
    0.3168252399998437
    >>> def is_legal_key(key):
    ...     return key and set(_LegalChars).issuperset(key)
    ... 
    >>> Timer("is_legal_key(random_str)", setup="from __main__ import random_str, is_legal_key").timeit(1)
    4.3335622880001665

    Also, I have updated the patch. Can you please review it? :)

    @vadmium
    Copy link
    Member

    vadmium commented Feb 7, 2016

    The same bug is in the _CookiePattern regular expression. For illegal characters other than a comma, the CookieError does not actually seem to be raised.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 7, 2016

    I take that back about _CookiePattern having the same bug; it uses a different input variable. But it is strange that _LegalKeyChars lists a comma, but _LegalChars omits it.

    @AnishShah
    Copy link
    Mannequin

    AnishShah mannequin commented Feb 7, 2016

    _LegalKeyChars contains "\-" whereas _LegalChars just contains "-".

    On Sun, Feb 7, 2016 at 4:33 PM, Martin Panter <report@bugs.python.org>
    wrote:

    Martin Panter added the comment:

    I take that back about _CookiePattern having the same bug; it uses a
    different input variable. But it is strange that _LegalKeyChars lists a
    comma, but _LegalChars omits it.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue26302\>


    @vadmium
    Copy link
    Member

    vadmium commented Feb 8, 2016

    The patch looks okay to me.

    The inconsistency between silently rejecting cookie “morsels” and raising an exception from load() also exists in 2.7, so maybe it is not a big deal.

    @AnishShah
    Copy link
    Mannequin

    AnishShah mannequin commented Feb 17, 2016

    Is this patch ready to merge?

    @jaraco
    Copy link
    Member Author

    jaraco commented Feb 18, 2016

    Looks good to me.

    @serhiy-storchaka
    Copy link
    Member

    LGTM. Do you want to commit the patch Jason?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 24, 2016

    New changeset 758cb13aaa2c by Anish Shah <anish.shah> in branch '3.5':
    Issue bpo-26302: Correctly identify comma as an invalid character for a cookie (correcting regression in Python 3.5).
    https://hg.python.org/cpython/rev/758cb13aaa2c

    New changeset 91eb7ae951a1 by Jason R. Coombs in branch 'default':
    Issue bpo-26302: merge from 3.5
    https://hg.python.org/cpython/rev/91eb7ae951a1

    @jaraco
    Copy link
    Member Author

    jaraco commented Feb 24, 2016

    Thanks Anish for the patch, now slated for Python 3.5.2 and Python 3.6.

    @jaraco jaraco closed this as completed Feb 24, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants