Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.Header (via add_header) encodes non-ASCII content incorrectly #41280

Closed
tlau mannequin opened this issue Dec 4, 2004 · 9 comments
Closed

email.Header (via add_header) encodes non-ASCII content incorrectly #41280

tlau mannequin opened this issue Dec 4, 2004 · 9 comments
Assignees
Labels
easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@tlau
Copy link
Mannequin

tlau mannequin commented Dec 4, 2004

BPO 1078919
Nosy @loewis, @warsaw, @pitrou, @devdanzin, @bitdancer
Files
  • add_header.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2010-12-14.00:30:57.665>
    created_at = <Date 2004-12-04.15:47:34.000>
    labels = ['easy', 'type-bug', 'library']
    title = 'email.Header (via add_header) encodes non-ASCII content incorrectly'
    updated_at = <Date 2010-12-14.00:30:57.664>
    user = 'https://bugs.python.org/tlau'

    bugs.python.org fields:

    activity = <Date 2010-12-14.00:30:57.664>
    actor = 'r.david.murray'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2010-12-14.00:30:57.665>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2004-12-04.15:47:34.000>
    creator = 'tlau'
    dependencies = []
    files = ['19114']
    hgrepos = []
    issue_num = 1078919
    keywords = ['patch', 'easy']
    message_count = 9.0
    messages = ['23536', '23537', '82125', '117903', '117905', '117924', '117935', '117955', '123912']
    nosy_count = 6.0
    nosy_names = ['loewis', 'barry', 'tlau', 'pitrou', 'ajaksu2', 'r.david.murray']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1078919'
    versions = ['Python 3.2']

    @tlau
    Copy link
    Mannequin Author

    tlau mannequin commented Dec 4, 2004

    I'm generating a MIME message with an attachment whose
    filename includes non-ASCII characters. I create the
    MIME header as follows:

    msg.add_header('Content-Disposition', 'attachment',
    filename=u'Fu\xdfballer_sind_klug.ppt')

    The Python-generated header looks like this:

    Content-disposition:
    =?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9IkZ1w59iYWxsZXJf?=
    =?utf-8?q?sind=5Fklug=2Eppt=22?=

    I sent messages with this header to Gmail, evolution,
    and thunderbird, and none of them correctly decode that
    header to suggest the correct default filename.
    However, I've found that those three mailers do behave
    correctly when the header looks like this instead:

    Content-disposition: attachment;
    filename="=?iso-8859-1?q?Fu=DFballer=5Fsind=5Fklug=2Eppt?="

    Is there a way to make Python's email module generate a
    Content-disposition header that works with common MUAs?
    I know I can manually encode the filename before
    passing it to add_header(), but it seems that Python
    should be doing this for me.

    @tlau tlau mannequin assigned warsaw Dec 4, 2004
    @tlau tlau mannequin added the stdlib Python modules in the Lib dir label Dec 4, 2004
    @tlau tlau mannequin assigned warsaw Dec 4, 2004
    @tlau tlau mannequin added the stdlib Python modules in the Lib dir label Dec 4, 2004
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Dec 5, 2004

    Logged In: YES
    user_id=21627

    The fact that neither Gmail, evolution, or thunderbird can
    decode this string properly does not mean that Python
    encodes it incorrectly. I cannot see an error in this header

    • although I can sympathize with the developers of the MUAs
      that this is a non-obvious usage of the standards.

    So I recommend you report this as a bug to the authors of
    the MUAs.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 14, 2009

    The proposed output has the virtue of being easier to read.

    @devdanzin devdanzin mannequin added type-feature A feature request or enhancement labels Feb 14, 2009
    @devdanzin devdanzin mannequin added easy labels Apr 22, 2009
    @warsaw warsaw assigned bitdancer and unassigned warsaw May 5, 2010
    @bitdancer
    Copy link
    Member

    I don't believe either the example that other mailers reject or the one that they accept are in fact RFC compliant. Encoded words are not supposed to occur in (structured) MIME headers. The behavior observed is a consequence of all headers, whether structured or unstructured, being treated as if they were unstructured by Header.

    (There's a different problem in Python3 with this example, but I'll deal with that in a separate issue.)

    What we have here is primarily a documentation bug. The way to generate the correct (RFC compliant) header is as follows:

    >>> m.add_header('Content-Disposition', 'attachment',
    ... filename=('iso-8859-1', '', 'Fußballer_sind_klug.ppt'))
    >>> str(m)
    'Content-Disposition: attachment; filename*="iso-8859-1\'\'Fu%DFballer_sind_klug.ppt"\n\n'

    I will add the explanation and this example to the docs. In addition, in 3.2 I will disallow non-ASCII parameter values unless they are specified in a three element tuple as in the example above. That will still leave some other places where structured headers are inappropriately encoded by Header (eg: addresses with non-ASCII names), but dealing with that is a somewhat deeper problem.

    @bitdancer bitdancer changed the title Email.Header encodes non-ASCII content incorrectly email.Header (via add_header) encodes non-ASCII content incorrectly Oct 3, 2010
    @bitdancer bitdancer added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Oct 3, 2010
    @bitdancer bitdancer changed the title Email.Header encodes non-ASCII content incorrectly email.Header (via add_header) encodes non-ASCII content incorrectly Oct 3, 2010
    @bitdancer bitdancer added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Oct 3, 2010
    @bitdancer
    Copy link
    Member

    Here is a patch.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 3, 2010

    In addition, in 3.2 I will disallow non-ASCII parameter values unless
    they are specified in a three element tuple as in the example above.

    Why would the caller be required to choose an encoding while you could simply default to utf-8? There doesn't seem to be much value in forcing the use of e.g. iso-8859-15.
    Also, I'm not sure I understand what the goal of email6 is if you're breaking compatibility in email5 anyway :)

    @bitdancer
    Copy link
    Member

    The compatibility argument is a fair point, and yes we could default to utf8 and no language. So that is probably a better solution than raising the error.

    @warsaw
    Copy link
    Member

    warsaw commented Oct 4, 2010

    RDM, I wonder if it wouldn't be better (in email6) to use an instance to represent the 3-tuple instead? It might make for clearer client code, and would allow you to default things you might generally not care about. E.g.

    class NonASCIIParameter: # XXX come up with better name
      def __init__(self, text, charset='utf-8', language=''):

    It's unfortunate that you have to reorder the arguments from the 3-tuple form of (charset, language, text) but I think you could play games with keyword arguments to make them consistent.

    In general the patch looks fine to me, though I suggest splitting test_add_header() into separate tests for each of the three conditions you're testing there.

    @bitdancer
    Copy link
    Member

    Committed the default-to-utf8 fix in r87217, splitting up the tests as suggested by Barry. Backported to 3.1 in r87218. Updated the documentation for 2.7 in r87219.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants