New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.Header (via add_header) encodes non-ASCII content incorrectly #41280
Comments
I'm generating a MIME message with an attachment whose msg.add_header('Content-Disposition', 'attachment',
filename=u'Fu\xdfballer_sind_klug.ppt') The Python-generated header looks like this: Content-disposition: I sent messages with this header to Gmail, evolution, Content-disposition: attachment; Is there a way to make Python's email module generate a |
Logged In: YES The fact that neither Gmail, evolution, or thunderbird can
So I recommend you report this as a bug to the authors of |
The proposed output has the virtue of being easier to read. |
I don't believe either the example that other mailers reject or the one that they accept are in fact RFC compliant. Encoded words are not supposed to occur in (structured) MIME headers. The behavior observed is a consequence of all headers, whether structured or unstructured, being treated as if they were unstructured by Header. (There's a different problem in Python3 with this example, but I'll deal with that in a separate issue.) What we have here is primarily a documentation bug. The way to generate the correct (RFC compliant) header is as follows: >>> m.add_header('Content-Disposition', 'attachment',
... filename=('iso-8859-1', '', 'Fußballer_sind_klug.ppt'))
>>> str(m)
'Content-Disposition: attachment; filename*="iso-8859-1\'\'Fu%DFballer_sind_klug.ppt"\n\n' I will add the explanation and this example to the docs. In addition, in 3.2 I will disallow non-ASCII parameter values unless they are specified in a three element tuple as in the example above. That will still leave some other places where structured headers are inappropriately encoded by Header (eg: addresses with non-ASCII names), but dealing with that is a somewhat deeper problem. |
Here is a patch. |
Why would the caller be required to choose an encoding while you could simply default to utf-8? There doesn't seem to be much value in forcing the use of e.g. iso-8859-15. |
The compatibility argument is a fair point, and yes we could default to utf8 and no language. So that is probably a better solution than raising the error. |
RDM, I wonder if it wouldn't be better (in email6) to use an instance to represent the 3-tuple instead? It might make for clearer client code, and would allow you to default things you might generally not care about. E.g. class NonASCIIParameter: # XXX come up with better name
def __init__(self, text, charset='utf-8', language=''): It's unfortunate that you have to reorder the arguments from the 3-tuple form of (charset, language, text) but I think you could play games with keyword arguments to make them consistent. In general the patch looks fine to me, though I suggest splitting test_add_header() into separate tests for each of the three conditions you're testing there. |
Committed the default-to-utf8 fix in r87217, splitting up the tests as suggested by Barry. Backported to 3.1 in r87218. Updated the documentation for 2.7 in r87219. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: