This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: invalid content-transfer-encoding in encoded-word causes KeyError
Type: crash Stage: resolved
Components: email Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aft90, barry, eamanu, maxking, miss-islington, r.david.murray
Priority: normal Keywords: patch

Created on 2019-09-30 20:57 by aft90, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 16503 merged aft90, 2019-10-01 07:26
PR 16596 merged miss-islington, 2019-10-05 16:19
PR 16597 merged miss-islington, 2019-10-05 16:19
Messages (7)
msg353624 - (view) Author: Andrei Troie (aft90) * Date: 2019-09-30 20:57
The following will cause a KeyError on email.message.get()

import email
import email.policy

text = "Subject: =?us-ascii?X?somevalue?="
eml = email.message_from_string(text, policy=email.policy.default)
eml.get('Subject')

This is caused by the fact that the code in _encoded_words.py assumes the content-transfer-encoding of an encoded-word is always 'q' or 'b' (after lowercasing): https://github.com/python/cpython/blob/aca8c406ada3bb547765b262bed3ac0cc6be8dd3/Lib/email/_encoded_words.py#L178

I realise it's probably a silly edge case and I haven't (yet) encountered something like this in the wild, but it does seem contrary to the spirit of the email library to raise an exception like this that can propagate all the way to email.message.get().
msg353629 - (view) Author: Emmanuel Arias (eamanu) * Date: 2019-10-01 02:42
Hello,

I am not a email expert, but according to RFC 1342 the enconding can be either "B" or "Q". So, I think is reasonable that when a not correct enconding is set, should be raise an exception


I think that we can improve the message raising a more specific Exception
msg353641 - (view) Author: Andrei Troie (aft90) * Date: 2019-10-01 07:47
I agree with you that according to the RFC, the cte can of course only be "B" or "Q". My point is that, in my example, if you try to do that you get a KeyError propagating all the way down to email.message.get(), which I believe is incorrect. 

Consider an encoded word which is syntactically incorrect in a different way, like  if for instance it's missing the terminating '?=':

'=?UTF-8?Q?somevalue'

Currently, this case will cause _encoded_words.py to throw a ValueError on this line:

_, charset, cte, cte_string, _ = ew.split('?')

Which is then caught by _header_value_parser.get_encoded_word() and handled appropriately.

To me this is the same kind of thing. I agree that an exception should be thrown, I just don't think it should propagate all the way back to the caller of email.message.get().

On a separate note, I agree with you that perhaps _encoded_words.decode() should throw more specific exceptions instead of ValueError and KeyError but that's a separate thing. I can fix that if you prefer.
msg353719 - (view) Author: Emmanuel Arias (eamanu) * Date: 2019-10-02 00:39
Hi Andrei sorry for my last message. Now I understand perfectly your idea and your PR. IMO this is a correct patch.
msg354019 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-10-05 16:19
New changeset 65dcc8a8dc41d3453fd6b987073a5f1b30c5c0fd by Abhilash Raj (Andrei Troie) in branch 'master':
bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
https://github.com/python/cpython/commit/65dcc8a8dc41d3453fd6b987073a5f1b30c5c0fd
msg354540 - (view) Author: miss-islington (miss-islington) Date: 2019-10-12 17:02
New changeset febe359559781019c0c8432a2f768809d00af6af by Miss Islington (bot) in branch '3.7':
bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
https://github.com/python/cpython/commit/febe359559781019c0c8432a2f768809d00af6af
msg354541 - (view) Author: miss-islington (miss-islington) Date: 2019-10-12 17:03
New changeset e540bb546163f108c7c304f2e6865efaa78cd4c2 by Miss Islington (bot) in branch '3.8':
bpo-38332: Catch KeyError from unknown cte in encoded-word. (GH-16503)
https://github.com/python/cpython/commit/e540bb546163f108c7c304f2e6865efaa78cd4c2
History
Date User Action Args
2022-04-11 14:59:21adminsetgithub: 82513
2019-10-12 17:04:19maxkingsetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 3.5, Python 3.6
2019-10-12 17:03:27miss-islingtonsetmessages: + msg354541
2019-10-12 17:02:28miss-islingtonsetnosy: + miss-islington
messages: + msg354540
2019-10-05 16:19:45miss-islingtonsetpull_requests: + pull_request16185
2019-10-05 16:19:39miss-islingtonsetpull_requests: + pull_request16184
2019-10-05 16:19:21maxkingsetnosy: + maxking
messages: + msg354019
2019-10-02 00:39:03eamanusetmessages: + msg353719
2019-10-01 07:47:20aft90setmessages: + msg353641
2019-10-01 07:26:19aft90setkeywords: + patch
stage: patch review
pull_requests: + pull_request16094
2019-10-01 02:42:49eamanusetmessages: + msg353629
2019-10-01 01:41:53eamanusetnosy: + eamanu
2019-09-30 20:57:33aft90create