Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.Message.as_string infinite loop #81945

Closed
mytran mannequin opened this issue Aug 5, 2019 · 16 comments
Closed

email.Message.as_string infinite loop #81945

mytran mannequin opened this issue Aug 5, 2019 · 16 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-email type-security A security issue

Comments

@mytran
Copy link
Mannequin

mytran mannequin commented Aug 5, 2019

BPO 37764
Nosy @warsaw, @bitdancer, @maxking, @miss-islington, @epicfaace
PRs
  • bpo-37764: Fix infinite loop when parsing unstructured email headers. #15239
  • [3.7] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) #15654
  • [3.8] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) #15686
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-09-05.01:21:40.178>
    created_at = <Date 2019-08-05.16:23:23.391>
    labels = ['type-security', '3.7', '3.8', 'expert-email', '3.9']
    title = 'email.Message.as_string infinite loop'
    updated_at = <Date 2019-09-05.05:18:05.873>
    user = 'https://bugs.python.org/mytran'

    bugs.python.org fields:

    activity = <Date 2019-09-05.05:18:05.873>
    actor = 'epicfaace'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-09-05.01:21:40.178>
    closer = 'maxking'
    components = ['email']
    creation = <Date 2019-08-05.16:23:23.391>
    creator = 'mytran'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37764
    keywords = ['patch']
    message_count = 16.0
    messages = ['349055', '349196', '349199', '349200', '349202', '349277', '349358', '349505', '349847', '349848', '349849', '349962', '350919', '351089', '351160', '351177']
    nosy_count = 6.0
    nosy_names = ['barry', 'r.david.murray', 'maxking', 'miss-islington', 'epicfaace', 'mytran']
    pr_nums = ['15239', '15654', '15686']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue37764'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @mytran
    Copy link
    Mannequin Author

    mytran mannequin commented Aug 5, 2019

    The following will hang the system until it runs out of memory.

    import email
    import email.policy
    
    text = """From: user@host.com
    To: user@host.com
    Bad-Header:
     =?us-ascii?Q?LCSwrV11+IB0rSbSker+M9vWR7wEDSuGqmHD89Gt=ea0nJFSaiz4vX3XMJPT4vrE?=
     =?us-ascii?Q?xGUZeOnp0o22pLBB7CYLH74Js=wOlK6Tfru2U47qR?=
     =?us-ascii?Q?72OfyEY2p2=2FrA9xNFyvH+fBTCmazxwzF8nGkK6D?=

    Hello!
    """

    eml = email.message_from_string(text, policy=email.policy.SMTPUTF8)
    eml.as_string()

    @mytran mytran mannequin added 3.7 (EOL) end of life topic-email labels Aug 5, 2019
    @epicfaace
    Copy link
    Mannequin

    epicfaace mannequin commented Aug 7, 2019

    I can't reproduce this on 3.9: https://github.com/epicfaace/cpython/runs/187997615

    @epicfaace
    Copy link
    Mannequin

    epicfaace mannequin commented Aug 7, 2019

    I also can't reproduce this on 3.7: https://github.com/epicfaace/cpython/runs/188005822

    @mytran
    Copy link
    Mannequin Author

    mytran mannequin commented Aug 7, 2019

    Reproduced on 3.7.4

    Looks like this started happening after this commit: dc20fc4#diff-19171ae20182f6759204a3436475ddd1

    @mytran
    Copy link
    Mannequin Author

    mytran mannequin commented Aug 7, 2019

    I looked at the job at https://travis-ci.com/epicfaace/cpython/jobs/223345147 and its running py3.6.

    @maxking
    Copy link
    Contributor

    maxking commented Aug 9, 2019

    This does look like a side-effect of the commit mentioned by mytran.

    The issues seems to be that email._header_value_parser.get_unstructured wrongfully assumes that anything leading with '=?' would be a valid rfc 2047 encoded word.

    This is a smaller test case to reproduce this bug:

    from email._header_value_parser import get_unstructured
    get_unstructured('=?utf-8?q?FSaiz4vX3XMJPT4vrExGUZeOnp0o22pLBB7CYLH74Js=3DwOlK6Tfru2U47qR?=72OfyEY2p2/rA9xNFyvH+fBTCmazxwzF8nGkK6D')

    @maxking maxking added the 3.8 only security fixes label Aug 9, 2019
    @maxking
    Copy link
    Contributor

    maxking commented Aug 10, 2019

    Adding security label since this can cause DOS.

    @maxking maxking added the type-security A security issue label Aug 10, 2019
    @epicfaace
    Copy link
    Mannequin

    epicfaace mannequin commented Aug 12, 2019

    Oh, both the Travis links I sent actually ended up reproducing the bug.

    I've made a PR that fixes with an even smaller test case:

    get_unstructured('=?utf-8?q?somevalue?=aa')

    It looks like this is caused because "aa" is thought to be an encoded word escape in

    if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
    -- thus, get_encoded_word fails, which ends up making get_unstructured go in an infinite loop.

    My PR makes the parser parse "=?utf-8?q?somevalue?=aa" as "=?utf-8?q?somevalue?=aa". However, the existing test cases make sure it parses "=?utf-8?q?somevalue?=nowhitespace" as "somevaluenowhitespace". I'm not too familiar with RFC 2047, but why are "aa" and "nowhitespace" treated differently? Should they be?

    @epicfaace epicfaace mannequin added the 3.9 only security fixes label Aug 15, 2019
    @maxking
    Copy link
    Contributor

    maxking commented Aug 16, 2019

    You have correctly identified that "=aa" is detected as a encoded word and causes the get_encoded_word to fail.

    However, "=?utf-8?q?somevalue?=aa" should ideally get parsed as "somevalueaa" and not "=?utf-8?q?somevalue?=aa". This is because "=?utf-8?q?somevalue?=" is a valid encoded word, it is just not followed by an empty whitespace.

    modified   Lib/email/_header_value_parser.py
    @@ -1037,7 +1037,10 @@ def get_encoded_word(value):
             raise errors.HeaderParseError(
                 "expected encoded word but found {}".format(value))
         remstr = ''.join(remainder)
    -    if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
    +    if (len(remstr) > 1 and
    +        remstr[0] in hexdigits and
    +        remstr[1] in hexdigits and
    +        tok.count('?') < 2):
             # The ? after the CTE was followed by an encoded word escape (=XX).
             rest, *remainder = remstr.split('?=', 1)

    This can be avoided by checking ? occurs twice in the tok.

    The 2nd bug, which needs a better test case, is that if the encoded_word is invalid, you will keep running into infinite loop, which you correctly fixed in your PR. However, the test case you used is more appropriate for the first issue.

    You can fix both the issues, for which, you need to add a test case for 2nd issue and fix for the first issue.

    Looking into the PR now.

    @maxking
    Copy link
    Contributor

    maxking commented Aug 16, 2019

    I meant, =aa is identified as encoded word escape

    @maxking
    Copy link
    Contributor

    maxking commented Aug 16, 2019

    Although, the 2nd bug I spoke of is kind of speculative, I haven't been able to find a test case which matches rfc2047_matcher but raises exception with get_encoded_word (after, ofcourse, the first bug is fixed), which the only way to cause an infinite loop.

    @epicfaace
    Copy link
    Mannequin

    epicfaace mannequin commented Aug 19, 2019

    Thanks, I've fixed the first case as you suggested.

    I found an example of the 2nd case -- '=?utf-8?q?=somevalue?=' -- which causes the method to hang. I've added a fix, though I'm not sure if it treats the string properly -- it parses it as '=?utf-8?q?=somevalue?=' and doesn't raise any defects. Is that the behavior we would want?

    @miss-islington
    Copy link
    Contributor

    New changeset c5b242f by Miss Islington (bot) (Ashwin Ramaswami) in branch 'master':
    bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239)
    c5b242f

    @miss-islington
    Copy link
    Contributor

    New changeset ea21389 by Miss Islington (bot) (Ashwin Ramaswami) in branch '3.7':
    [3.7] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) (GH-15654)
    ea21389

    @maxking
    Copy link
    Contributor

    maxking commented Sep 5, 2019

    New changeset 6ad0a2c by Abhilash Raj in branch '3.8':
    [3.8] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) (GH-15686)
    6ad0a2c

    @maxking maxking closed this as completed Sep 5, 2019
    @epicfaace
    Copy link
    Mannequin

    epicfaace mannequin commented Sep 5, 2019

    Should we get a CVE for this because this is a security issue?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-email type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants