classification
Title: email.Header (via add_header) encodes non-ASCII content incorrectly
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: ajaksu2, barry, loewis, pitrou, r.david.murray, tlau
Priority: normal Keywords: easy, patch

Created on 2004-12-04 15:47 by tlau, last changed 2010-12-14 00:30 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
add_header.patch r.david.murray, 2010-10-03 03:37 review
Messages (9)
msg23536 - (view) Author: Tessa Lau (tlau) Date: 2004-12-04 15:47
I'm generating a MIME message with an attachment whose
filename includes non-ASCII characters.  I create the
MIME header as follows:

msg.add_header('Content-Disposition', 'attachment',
filename=u'Fu\xdfballer_sind_klug.ppt')

The Python-generated header looks like this:

Content-disposition:
=?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9IkZ1w59iYWxsZXJf?=
        =?utf-8?q?sind=5Fklug=2Eppt=22?=

I sent messages with this header to Gmail, evolution,
and thunderbird, and none of them correctly decode that
header to suggest the correct default filename. 
However, I've found that those three mailers do behave
correctly when the header looks like this instead:

Content-disposition: attachment;
filename="=?iso-8859-1?q?Fu=DFballer=5Fsind=5Fklug=2Eppt?="

Is there a way to make Python's email module generate a
Content-disposition header that works with common MUAs?
 I know I can manually encode the filename before
passing it to add_header(), but it seems that Python
should be doing this for me.
msg23537 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-12-05 19:42
Logged In: YES 
user_id=21627

The fact that neither Gmail, evolution, or thunderbird can
decode this string properly does not mean that Python
encodes it incorrectly. I cannot see an error in this header
- although I can sympathize with the developers of the MUAs
that this is a non-obvious usage of the standards.

So I recommend you report this as a bug to the authors of
the MUAs.
msg82125 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-02-14 22:21
The proposed output has the virtue of being easier to read.
msg117903 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-03 02:16
I don't believe either the example that other mailers reject or the one that they accept are in fact RFC compliant.  Encoded words are not supposed to occur in (structured) MIME headers.  The behavior observed is a consequence of all headers, whether structured or unstructured, being treated as if they were unstructured by Header.

(There's a different problem in Python3 with this example, but I'll deal with that in a separate issue.)

What we have here is primarily a documentation bug.  The way to generate the correct (RFC compliant) header is as follows:

>>> m.add_header('Content-Disposition', 'attachment',
... filename=('iso-8859-1', '', 'Fußballer_sind_klug.ppt'))
>>> str(m)
'Content-Disposition: attachment; filename*="iso-8859-1\'\'Fu%DFballer_sind_klug.ppt"\n\n'

I will add the explanation and this example to the docs.  In addition, in 3.2 I will disallow non-ASCII parameter values unless they are specified in a three element tuple as in the example above.  That will still leave some other places where structured headers are inappropriately encoded by Header (eg: addresses with non-ASCII names), but dealing with that is a somewhat deeper problem.
msg117905 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-03 03:37
Here is a patch.
msg117924 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-03 19:25
> In addition, in 3.2 I will disallow non-ASCII parameter values unless
> they are specified in a three element tuple as in the example above.

Why would the caller be required to choose an encoding while you could simply default to utf-8? There doesn't seem to be much value in forcing the use of e.g. iso-8859-15.
Also, I'm not sure I understand what the goal of email6 is if you're breaking compatibility in email5 anyway :)
msg117935 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-04 00:07
The compatibility argument is a fair point, and yes we could default to utf8 and no language.  So that is probably a better solution than raising the error.
msg117955 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2010-10-04 15:01
RDM, I wonder if it wouldn't be better (in email6) to use an instance to represent the 3-tuple instead?  It might make for clearer client code, and would allow you to default things you might generally not care about.  E.g.

class NonASCIIParameter: # XXX come up with better name
  def __init__(self, text, charset='utf-8', language=''):

It's unfortunate that you have to reorder the arguments from the 3-tuple form of (charset, language, text) but I think you could play games with keyword arguments to make them consistent.

In general the patch looks fine to me, though I suggest splitting test_add_header() into separate tests for each of the three conditions you're testing there.
msg123912 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-14 00:30
Committed the default-to-utf8 fix in r87217, splitting up the tests as suggested by Barry.  Backported to 3.1 in r87218.  Updated the documentation for 2.7 in r87219.
History
Date User Action Args
2010-12-27 17:04:58r.david.murrayunlinkissue1685453 dependencies
2010-12-14 00:30:57r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg123912

stage: patch review -> resolved
2010-10-04 15:01:19barrysetmessages: + msg117955
2010-10-04 00:07:45r.david.murraysetmessages: + msg117935
2010-10-03 19:25:32pitrousetnosy: + pitrou
messages: + msg117924
2010-10-03 03:37:48r.david.murraysetfiles: + add_header.patch
keywords: + patch
messages: + msg117905

stage: test needed -> patch review
2010-10-03 02:16:09r.david.murraysettype: enhancement -> behavior
messages: + msg117903
title: Email.Header encodes non-ASCII content incorrectly -> email.Header (via add_header) encodes non-ASCII content incorrectly
2010-08-26 15:24:21BreamoreBoysetversions: + Python 3.2, - Python 2.7
2010-05-05 13:43:52barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-04-22 16:03:19ajaksu2setkeywords: + easy
2009-03-30 22:56:23ajaksu2linkissue1685453 dependencies
2009-02-14 22:21:53ajaksu2setnosy: + ajaksu2
stage: test needed
type: enhancement
messages: + msg82125
versions: + Python 2.7, - Python 2.4
2004-12-04 15:47:34tlaucreate