This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: EmailMessage should support Header objects
Type: Stage:
Components: email Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, brandon-rhodes, r.david.murray
Priority: normal Keywords:

Created on 2014-03-29 03:33 by brandon-rhodes, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg215112 - (view) Author: Brandon Rhodes (brandon-rhodes) * Date: 2014-03-29 03:33
Currently, the new wonderful EmailMessage class ignores the encoding specified in any Header objects that are provided to it.

import email.message, email.header
m = email.message.Message()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())

Subject: =?iso-8859-1?q?B=F6=F0varr?=

m = email.message.EmailMessage()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())

Traceback (most recent call last):
  ...
TypeError: 'Header' object does not support indexing

If the EmailMessage came to recognize and support Header objects, then Python programmers under specific constraints regarding what encodings their customers' email clients will recognize and support would be able to hand-craft the selection of the correct encoding instead of being forced to either ASCII or UTF-8 with binary as the two predominant choices that EmailMessage makes on its own.
msg220637 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-15 13:36
@David can we have your comments please.
msg220646 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-15 15:11
I have to look at the implementation to remind myself how hard this would be to implement.  The goal was to leave Header a legacy API...if you need that level of control, you use the old API.  But I can see the functionality argument, and Header *is* a reasonable API for building such a custom header.  It may be a while before I have time to take a look at it, though, so if anyone else wants to take a look, feel free :)

One problem is that while the parser does retain the cte of each encoded word, if the header is refolded for any reason the cte is (often? always? I don't remember) ignored because encoded words may be recombined during folding.  And if you are creating the header inside a program, that header is going to get refolded on serialization, unless max_line_length is set to 0/None or the header fits on one line.

So it's not obvious to me that this can work at all.  What *could* work would be to have a policy setting to use something other than utf-8 for the CTE for encoding headers, but that would be a global setting (applying to all headers that are refolded during serialization).

Basically, controlling the CTE of encoded words on an individual basis goes directly against the model used by the new Email API: in that model, the "model" of the email message is the *decoded* version of the message, and serialization is responsible for doing whatever CTE encoding is appropriate.  The goal is to *hide* the details of the RFCs from the library user.  So, if you want control at that level, you have to go back to the old API, which required you do understand what you were doing at the RFC level...
History
Date User Action Args
2022-04-11 14:58:00adminsetgithub: 65294
2019-03-15 23:14:34BreamoreBoysetnosy: - BreamoreBoy
2014-06-15 15:11:59r.david.murraysetmessages: + msg220646
2014-06-15 13:36:19BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220637
2014-03-29 03:33:26brandon-rhodescreate