This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author salty-horse
Recipients aalbrecht, barry, cjw296, salty-horse, splorgdar
Date 2008-06-25.19:42:04
SpamBayes Score 0.0024254664
Marked as misclassified No
Message-id <1214422926.06.0.764906749021.issue1974@psf.upfronthosting.co.za>
In-reply-to
Content
I think there's been a little misinterpretation of the standard in the
comments above.

It's important to note that RFC 2822 basically defines folding as
"adding a CRLF before an existing whitespace in the original message". 

See http://tools.ietf.org/html/rfc2822#section-2.2.3

It does *not* allow prepending folded lines with extra characters that
were not in the original message such as '\t' or ' '.

This is exactly what _encode_chunks does in header.py:
    joiner = NL + self._continuation_ws

(Note that the email package docs and Header docstring use the word
'prepend' which is reflects the error in the code).

With a correct implementation, why would I want to choice of which type
of character to line-break on when folding?
The whole notion of controlling the value of continuation_ws seems wrong.

However, changing the default continuation_ws to ' ', as the patch
suggests, will output syntactically correct headers in the majority of
cases (due to other bugs that remove trailing whitespace and merge
consecutive whitespace into one character).


All in all, I agree with the change of the default continuation_ws due
to its lucky side-effects, but as Barry hinted, the algorithm needs some
serious work to really output valid headers.

Some examples of the good and bad behaviors:

>>> from email.Header import Header
>>> l = ['<%d@dom.ain>' % i for i in range(8)]

>>> # this turns out fine
>>> Header(' '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'

# This does not fold even though it should
>>> Header('\t'.join(l), continuation_ws=' ').encode()
'<0@dom.ain>\t<1@dom.ain>\t<2@dom.ain>\t<3@dom.ain>\t<4@dom.ain>\t<5@dom.ain>\t<6@dom.ain>\t<7@dom.ain>'

# And here the 4-char whitespace is shrinked into one
>>> Header('    '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'
History
Date User Action Args
2008-06-25 19:42:06salty-horsesetspambayes_score: 0.00242547 -> 0.0024254664
recipients: + salty-horse, barry, aalbrecht, cjw296, splorgdar
2008-06-25 19:42:06salty-horsesetspambayes_score: 0.00242547 -> 0.00242547
messageid: <1214422926.06.0.764906749021.issue1974@psf.upfronthosting.co.za>
2008-06-25 19:42:05salty-horselinkissue1974 messages
2008-06-25 19:42:04salty-horsecreate