Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid content-transfer-encoding in encoded-word causes KeyError #82513

Closed
aft90 mannequin opened this issue Sep 30, 2019 · 7 comments
Closed

invalid content-transfer-encoding in encoded-word causes KeyError #82513

aft90 mannequin opened this issue Sep 30, 2019 · 7 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-email type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@aft90
Copy link
Mannequin

aft90 mannequin commented Sep 30, 2019

BPO 38332
Nosy @warsaw, @bitdancer, @maxking, @eamanu, @miss-islington, @aft90
PRs
  • bpo-38332: catch KeyError from unknown cte in encoded-word #16503
  • [3.8] bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503) #16596
  • [3.7] bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503) #16597
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-12.17:04:19.496>
    created_at = <Date 2019-09-30.20:57:33.378>
    labels = ['3.7', '3.8', 'expert-email', 'type-crash', '3.9']
    title = 'invalid content-transfer-encoding in encoded-word causes KeyError'
    updated_at = <Date 2019-10-12.17:04:19.494>
    user = 'https://github.com/aft90'

    bugs.python.org fields:

    activity = <Date 2019-10-12.17:04:19.494>
    actor = 'maxking'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-12.17:04:19.496>
    closer = 'maxking'
    components = ['email']
    creation = <Date 2019-09-30.20:57:33.378>
    creator = 'aft90'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38332
    keywords = ['patch']
    message_count = 7.0
    messages = ['353624', '353629', '353641', '353719', '354019', '354540', '354541']
    nosy_count = 6.0
    nosy_names = ['barry', 'r.david.murray', 'maxking', 'eamanu', 'miss-islington', 'aft90']
    pr_nums = ['16503', '16596', '16597']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue38332'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @aft90
    Copy link
    Mannequin Author

    aft90 mannequin commented Sep 30, 2019

    The following will cause a KeyError on email.message.get()

    import email
    import email.policy
    
    text = "Subject: =?us-ascii?X?somevalue?="
    eml = email.message_from_string(text, policy=email.policy.default)
    eml.get('Subject')

    This is caused by the fact that the code in _encoded_words.py assumes the content-transfer-encoding of an encoded-word is always 'q' or 'b' (after lowercasing):

    bstring, defects = _cte_decoders[cte](bstring)

    I realise it's probably a silly edge case and I haven't (yet) encountered something like this in the wild, but it does seem contrary to the spirit of the email library to raise an exception like this that can propagate all the way to email.message.get().

    @aft90 aft90 mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-email type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 30, 2019
    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Oct 1, 2019

    Hello,

    I am not a email expert, but according to RFC 1342 the enconding can be either "B" or "Q". So, I think is reasonable that when a not correct enconding is set, should be raise an exception

    I think that we can improve the message raising a more specific Exception

    @aft90
    Copy link
    Mannequin Author

    aft90 mannequin commented Oct 1, 2019

    I agree with you that according to the RFC, the cte can of course only be "B" or "Q". My point is that, in my example, if you try to do that you get a KeyError propagating all the way down to email.message.get(), which I believe is incorrect.

    Consider an encoded word which is syntactically incorrect in a different way, like if for instance it's missing the terminating '?=':

    '=?UTF-8?Q?somevalue'

    Currently, this case will cause _encoded_words.py to throw a ValueError on this line:

    _, charset, cte, cte_string, _ = ew.split('?')

    Which is then caught by _header_value_parser.get_encoded_word() and handled appropriately.

    To me this is the same kind of thing. I agree that an exception should be thrown, I just don't think it should propagate all the way back to the caller of email.message.get().

    On a separate note, I agree with you that perhaps _encoded_words.decode() should throw more specific exceptions instead of ValueError and KeyError but that's a separate thing. I can fix that if you prefer.

    @eamanu
    Copy link
    Mannequin

    eamanu mannequin commented Oct 2, 2019

    Hi Andrei sorry for my last message. Now I understand perfectly your idea and your PR. IMO this is a correct patch.

    @maxking
    Copy link
    Contributor

    maxking commented Oct 5, 2019

    New changeset 65dcc8a by Abhilash Raj (Andrei Troie) in branch 'master':
    bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
    65dcc8a

    @miss-islington
    Copy link
    Contributor

    New changeset febe359 by Miss Islington (bot) in branch '3.7':
    bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
    febe359

    @miss-islington
    Copy link
    Contributor

    New changeset e540bb5 by Miss Islington (bot) in branch '3.8':
    bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
    e540bb5

    @maxking maxking closed this as completed Oct 12, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes topic-email type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants