classification
Title: email.MIMEText.MIMEText.as_string incorrectly folding long subject header
Type: Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: barry Nosy List: aalbrecht, barry, cjw296, djc, gagern, nicdumz, salty-horse, splorgdar
Priority: normal Keywords:

Created on 2008-01-30 14:48 by cjw296, last changed 2010-07-06 09:40 by nicdumz. This issue is now closed.

Files
File name Uploaded Description Edit
test.py splorgdar, 2008-05-23 21:49 reproduction case
py_issue1974.diff aalbrecht, 2008-06-22 12:06 Init Header class in test case without continuation_ws keyword
Messages (16)
msg61866 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2008-01-30 14:48
Somewhere in email.MIMEText.MIMEText.as_string (I'm not sure where) long
subject headers are folded using a newline followed by a tab:

>>> from email.MIMEText import MIMEText
>>> m = MIMEText('foo')
>>> m['Subject']='AA '*40
>>> m.as_string()
'Content-Type: text/plain; charset="us-ascii"\nMIME-Version:
1.0\nContent-Transfer-Encoding: 7bit\nSubject: AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA\n\tAA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA \n\nfoo'

While RFC 822 section 3.1.1 doesn't forbid this type of folding, the
current behaviour mis-renders in both Thunderbird and Outlook.

Messages generated by Thunderbird and Outlook represent the above
subject as follows:

'Content-Type: text/plain; charset="us-ascii"\nMIME-Version:
1.0\nContent-Transfer-Encoding: 7bit\nSubject: AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA\n AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA \n\nfoo'

Could the email library be adjusted to do the same? Years of wondering
why mails re-sent by mailman, mails from any number of python web
systems and recent complaints from customers of mine of mis-rendered
emails could be solved by doing so.
msg67270 - (view) Author: Steve Baker (splorgdar) Date: 2008-05-23 21:33
I'm also hitting this problem. I'm including a demo script. The minimal
condition for me to reproduce this bug is that the subject contain at
least 76 characters of any kind, followed by a space, followed by at
least 3 characters of any kind.
msg68562 - (view) Author: Andi Albrecht (aalbrecht) * Date: 2008-06-22 12:06
For me this issue seems to be a duplicate of issue1645148.

In my opinion the test case that checks if headers created by strings or
Header instances are equal is incorrect. It shouldn't set the
continuation whitespace explicitly when creating a Headers instance. I
would expect that

  A) msg['aheader'] = 'a long string that breaks, but shortened here'

and

  B) msg['aheader'] = Header('a long string that breaks, but shortened
here')

result in the same output. But for A) a Header instance is initialized
in Generator._write_headers() with continuation_ws set to `\t` and for
B) the default ' ' is used (see http://bugs.python.org/msg31102).

I'm uploading a patch that modifies test_string_headerinst_eq() to what
I think it should look like with this message. Of course, this test will
fail at the moment, but maybe you agree that the Header instance in this
test should be initialized without explicitly setting continuation_ws.
msg68568 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2008-06-22 16:06
Andi, I'm in total agreement with you :-)
(so if this bug could get fixed, both issues could get closed)
msg68630 - (view) Author: Andi Albrecht (aalbrecht) * Date: 2008-06-23 14:45
IMO, the best way to resolve this is to use the default continuation
whitespace when a Header class is created from a string. But I'm pretty
unsure about the default whitespace character. After having a look at a
random set of mails in my inbox it seems that in most cases a tab is
used for all headers except for the subject (excluding mails sent from
Python scripts of course...). I would prefer '\t'; it's allowed and
results in a cleaner output - but it won't solve the problem with some
email clients which was the starting point for this issue (a short note
in the docs should resolve this anyway... ;-)

But I wonder if for some reason setting headers from strings
intentionally differs from using the Header class? There are some tests
that explicitly test both cases and expect different output. In
TestLongHeaders it's

 test_long_unbreakable_lines_with_continuation()
 test_long_lines_with_different_header(self)
 test_long_received_header() [continuation_ws set to tab here]

So, what I would do is
 * remove continuation_ws keyword from Generator._write_headers()
 * make '\t' the default cont. whitespace for headers (Header class)
 * update test cases and the documentation accordingly

Do you think that this is the right way to go? Or would it break too
much code outside stdlib if the default ws changes?
msg68634 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-06-23 14:59
I'm pretty convinced that this stuff is broken and needs serious
attention.  Unfortunately, I won't have time to address the email
package before 2.6 and 3.0 get released.  My current work lives at

bzr+ssh://pythonbzr@code.python.org/python/users/barry/30email

but it hasn't been updated for a while.  I highly suggest taking this to
the email-sig to see if anybody else has spare cycles to work in this
for now.
msg68658 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2008-06-23 23:06
Again, in total agreement with Andi *EXCEPT*:

Please use ' ' instead of '\t' for the continuation character.

It's the \t that gets mis-rendered by Outlook and Thunderbird (at the
very least!) and since ' ' is also valid according to the RFC, we make
life much easier for ourselves if we just stick to that...
msg68665 - (view) Author: Andi Albrecht (aalbrecht) * Date: 2008-06-24 04:33
FWIW, I've uploaded a patch to codereview: 

http://codereview.appspot.com/2407

It uses a space character as Chris suggested.
msg68750 - (view) Author: Ori Avtalion (salty-horse) Date: 2008-06-25 19:42
I think there's been a little misinterpretation of the standard in the
comments above.

It's important to note that RFC 2822 basically defines folding as
"adding a CRLF before an existing whitespace in the original message". 

See http://tools.ietf.org/html/rfc2822#section-2.2.3

It does *not* allow prepending folded lines with extra characters that
were not in the original message such as '\t' or ' '.

This is exactly what _encode_chunks does in header.py:
    joiner = NL + self._continuation_ws

(Note that the email package docs and Header docstring use the word
'prepend' which is reflects the error in the code).

With a correct implementation, why would I want to choice of which type
of character to line-break on when folding?
The whole notion of controlling the value of continuation_ws seems wrong.

However, changing the default continuation_ws to ' ', as the patch
suggests, will output syntactically correct headers in the majority of
cases (due to other bugs that remove trailing whitespace and merge
consecutive whitespace into one character).


All in all, I agree with the change of the default continuation_ws due
to its lucky side-effects, but as Barry hinted, the algorithm needs some
serious work to really output valid headers.

Some examples of the good and bad behaviors:

>>> from email.Header import Header
>>> l = ['<%d@dom.ain>' % i for i in range(8)]

>>> # this turns out fine
>>> Header(' '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'

# This does not fold even though it should
>>> Header('\t'.join(l), continuation_ws=' ').encode()
'<0@dom.ain>\t<1@dom.ain>\t<2@dom.ain>\t<3@dom.ain>\t<4@dom.ain>\t<5@dom.ain>\t<6@dom.ain>\t<7@dom.ain>'

# And here the 4-char whitespace is shrinked into one
>>> Header('    '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'
msg68767 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2008-06-26 08:01
Ori,

I do agree with both you and Barry but is there any chance someone could
make the one-character change to make the /t a space so we can stop
seeing weirdness in common mail clients?

Perhaps a separate issue could be raised to refactor this all correctly?
msg84601 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2009-03-30 18:10
Barry and I talked about this and he's is now working on it.
We're literally going to remove the \t folding substitution and have it
do the default space folding substitution instead.
msg84698 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2009-03-30 22:43
So astoundingly, this change has deep implications.  The upshot is that
it's difficult to fix this so that headers look nice for e.g. Subject
headers, but so that splitting and wrapping work as expected for e.g.
machine readability of Received headers.

After discussion with other sprinters, I'm committing a change to Python
2.7 to fix this, but I am not back porting the change to 2.6.  I think
we should Do It Right for Python 3.1 but this requires (IMO) API changes.

r70772
msg84755 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2009-03-31 07:16
It's probably worth noting that changing:

from email.mime.text import MIMEText

m = MIMEText('foo')
m['subject']='something long'

...to:

from email.header import Header
m = MIMEText('foo')
m['subject']=Header('something long')

...will do folding without the \t problem, even in Python 2.6

I guess the moral of the story is that all headers should really be
header objects. I think Barry has some ideas on that ;-)
msg109370 - (view) Author: Nicolas Dumazet (nicdumz) Date: 2010-07-06 08:22
Hello folks.

(stumbling on this bug with Python2.7 release, noting that a few Mercurial tests broke with 2.7)

I have no problem whatsoever with the fix itself (you know emails better than me), but you broke backwards compatibility for email.generator.Generator here, without propagating the changes to previous versions.

1) It would have been nice to flag this somewhere in the release notes, and/or the documentation for email.generator.Generator. Finding the reason of the Mercurial failures was... a bit trickier than necessary :)
2) What durable solutions can you propose for those of us that have to support several Python versions and share a similar behaviour?

In a perfect world, I would simply do this:

class CompatibleHeader(email.Header.Header.__class__):
    """
    Python2.7 introduces a backwards incompatible change
    (Python issue1974, r70772) in email.generaor.Generator code:
    pre-2.7 code passed "continuation_ws='\t'" to the Header
    constructor, and 2.7 removed this parameter.

    Default argument is continuation_ws=' ', which means that
    the behaviour is different in <2.7 and 2.7

    We consider the 2.7 behaviour to be preferable, but need
    to have an unified behaviour for versions 2.4 to 2.7
    """
    def __init__(self, **kw):
        # override continuation_ws
        kw['continuation_ws'] = ' '
        email.Header.Header.__init__(self, **kw)

email.Header.Header.__class__ = CompatibleHeader

and get over it, but you'll notice that Header is still an old-style class... so that's not possible.

Any suggestions?

Thanks!
msg109371 - (view) Author: Chris Withers (cjw296) * (Python committer) Date: 2010-07-06 08:30
Maybe an old-fashioned monkey-patch would be the way to go?
msg109376 - (view) Author: Nicolas Dumazet (nicdumz) Date: 2010-07-06 09:40
Sure, where was my head.

So, a simple patch like this one:

_oldheaderinit = email.Header.Header.__init__
def _unifiedheaderinit(self, *args, **kw):
    # override continuation_ws
    kw['continuation_ws'] = ' '
    _oldheaderinit(self, *args, **kw)
email.Header.Header.__dict__['__init__'] = _unifiedheaderinit


fixes the issue for us, and might be helpful to others.
History
Date User Action Args
2010-07-06 09:40:57nicdumzsetmessages: + msg109376
2010-07-06 08:30:03cjw296setmessages: + msg109371
2010-07-06 08:22:45nicdumzsetnosy: + djc, nicdumz
messages: + msg109370
2009-03-31 07:16:01cjw296setkeywords: - patch

messages: + msg84755
2009-03-30 22:43:41barrysetstatus: open -> closed

messages: + msg84698
versions: - Python 2.6
2009-03-30 18:10:05cjw296setresolution: accepted
messages: + msg84601
versions: + Python 2.6, Python 3.1, Python 2.7, - Python 2.4
2009-03-22 15:14:18gagernsetnosy: + gagern
2008-06-26 08:01:52cjw296setmessages: + msg68767
2008-06-25 19:42:05salty-horsesetnosy: + salty-horse
messages: + msg68750
2008-06-24 04:34:00aalbrechtsetmessages: + msg68665
2008-06-23 23:06:35cjw296setmessages: + msg68658
2008-06-23 14:59:24barrysetmessages: + msg68634
2008-06-23 14:45:33aalbrechtsetmessages: + msg68630
2008-06-22 16:06:05cjw296setmessages: + msg68568
2008-06-22 12:06:53aalbrechtsetfiles: + py_issue1974.diff
keywords: + patch
messages: + msg68562
nosy: + aalbrecht
2008-05-23 21:49:11splorgdarsetfiles: - test
2008-05-23 21:49:02splorgdarsetfiles: + test.py
2008-05-23 21:33:33splorgdarsetfiles: + test
nosy: + splorgdar
messages: + msg67270
2008-02-01 15:52:13gvanrossumsetassignee: barry
nosy: + barry
2008-01-30 14:48:04cjw296create