classification
Title: email.header.Header doesn't fold headers correctly
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, kitterma, python-dev, r.david.murray, srikanths
Priority: normal Keywords: patch

Created on 2011-03-14 07:32 by kitterma, last changed 2012-01-03 07:37 by srikanths. This issue is now closed.

Files
File name Uploaded Description Edit
header_folding_tests.patch r.david.murray, 2011-04-07 22:09
better_header_spliter.patch r.david.murray, 2011-04-10 20:12 review
Messages (13)
msg130793 - (view) Author: Scott Kitterman (kitterma) Date: 2011-03-14 07:32
Header folding is very different (non-existent as far as I've found so far) in Python3.  Here's a short example:

#!/usr/bin/python
# -*- coding: ISO-8859-1

from email.header import Header

hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)'

print(Header(hdrin))

With python2.6 the output is:


Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
 [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)

With python3.1 or 3.2 the output is one line:

Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)

This makes it very difficult to write header processing code that works for both Python2 and Python3 even if one can fold headers at all in Python3.
msg130920 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-03-14 22:47
It exists, but clearly it is broken.  I'll look in to it.
msg133101 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-06 00:35
Ah, it isn't broken, it's just that the default changed.  In 2.x, the default was maxlinelen=78, in 3.x, the default is maxlinelen=None (unlimited), but generator passes in an override of 78 when formatting output.  So you can specify an explicit maxlinelen=78 and that will wrap the headers in both 2.x and 3.x.  (There are differences in the wrapping algorithms, though!)
msg133169 - (view) Author: Scott Kitterman (kitterma) Date: 2011-04-06 21:21
Not so fast ...  I may have done this wrong, but I get:

print(Header(hdrin,maxlinelen=78))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)

all in one line with python3.2, so maxlinelen doesn't appear to do anything.  With python2.7 it seems to when invoked that way:

Python 2.7.1+ (r271:86832, Mar 24 2011, 00:39:14) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.header import Header
>>> hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)'
>>> print(Header(hdrin))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
 [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)
>>> print(Header(hdrin, maxlinelen=30))
Received: from
 mailout00.controlledmail.com
 (mailout00.controlledmail.com
 [72.81.252.19]) by
 mailwash7.pair.com (Postfix)
 with ESMTP id 16BB5BAD5 for
 <bcc@kitterman.com>;
 Sun, 13 Mar 2011 23:46:05
 -0400 (EDT)
msg133170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-06 21:43
You have to do an 'encode' to get the wrapped header. __str__ uses maxlinelen=None.

However, there does seem to be a problem with the line wrapping algorithm revealed by your example: it is only doing a line break at the ';', not at any spaces.  I will look in to this.
msg133233 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 16:56
OK, it looks like the wrapping problem arises when the line contains runs of blank delimited tokens longer than maxlinelen *and* the line also contains ';'s.  The line is then split at the ';' and the remaining overlong pieces are not split.

I'll work on a fix.
msg133242 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 17:51
Here is a patch containing three test cases that demonstrate three different failings of the header folding algorithm.  I'm working on the fix, but it is non-trivial.
msg133243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 17:57
Note that 2.7 fails two of these tests as well, but for different reasons.  I'm not currently planning to fix 2.7, as its behavior at least (a) doesn't lose non-whitespace information and (b) doesn't exceed the maxheaderlen.
msg133266 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-07 22:09
Here is an updated test patch that brings the test coverage of the relevant code much closer to 100%.  There are still three lines and one branch uncovered, but it appears as though one of the bugs is preventing the test case that would produce full coverage from getting to the relevant code path.  This gives me enough coverage to feel safer mucking about with the algorithm.
msg133283 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-08 01:01
New changeset 10725fc76e11 by R David Murray in branch '3.1':
#11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/10725fc76e11

New changeset 74ec64dc3538 by R David Murray in branch '3.2':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/74ec64dc3538

New changeset 5ec2695c9c15 by R David Murray in branch 'default':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/5ec2695c9c15
msg133477 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-10 20:12
This was quite the adventure.  The more I worked on fixing the tests, the more if/else cases the existing splitting algorithm grew.  When I reached the point where fixing one test broke two others, I thought maybe it was time to try a different approach.

Based on the knowledge gathered by banging my head on the old algorithm, I developed a new one.  This one is more RFC2822/RFC5322 compliant, I believe.  It breaks only at FWS, but still gives preference to breaking after commas or semicolons by default.

I had to adjust several tests that tested broken behavior: the "folded" lines were longer than maxlen even though there were suitable fold points.

I'm very happy with this patch because there are 70 fewer lines of code but the module passes more tests.

Even though the code changes are extensive, I plan to apply this to 3.2.  It fixes bugs, and the new code is at least somewhat easier to understand than the old code (if only because there is less of it!)  I don't plan to apply it to 3.1 because one older test fails if the patch is applied and I don't understand why (it appears to have nothing to do with line wrapping, and the same test works fine in 3.2).
msg133480 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-10 20:30
Note that this fix solves issue 11772, so I've closed that one as a duplicate.
msg133969 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-18 14:12
New changeset 51a551acb60d by R David Murray in branch '3.2':
#11492: rewrite header folding algorithm.  Less code, more passing tests.
http://hg.python.org/cpython/rev/51a551acb60d

New changeset fcd20a565b95 by R David Murray in branch 'default':
Merge: #11492: rewrite header folding algorithm.  Less code, more passing tests.
http://hg.python.org/cpython/rev/fcd20a565b95
History
Date User Action Args
2012-01-03 07:37:54srikanthssetnosy: + srikanths
2011-04-18 15:11:16r.david.murraylinkissue5612 superseder
2011-04-18 15:05:41r.david.murraylinkissue1372770 superseder
2011-04-18 14:45:33r.david.murraylinkissue8769 superseder
2011-04-18 14:27:32r.david.murraysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2011-04-18 14:12:29python-devsetmessages: + msg133969
2011-04-10 20:30:53r.david.murraysetmessages: + msg133480
2011-04-10 20:12:32r.david.murraysetfiles: + better_header_spliter.patch

stage: needs patch -> patch review
messages: + msg133477
versions: - Python 3.1
2011-04-10 16:59:34r.david.murraylinkissue11772 superseder
2011-04-08 01:01:28python-devsetnosy: + python-dev
messages: + msg133283
2011-04-07 22:09:38r.david.murraysetfiles: + header_folding_tests.patch

messages: + msg133266
2011-04-07 22:09:02r.david.murraysetfiles: - header_folding_tests.patch
2011-04-07 17:57:03r.david.murraysetfiles: + header_folding_tests.patch

messages: + msg133243
2011-04-07 17:53:40r.david.murraysetfiles: - header_folding_tests.patch
2011-04-07 17:51:17r.david.murraysetfiles: + header_folding_tests.patch
title: email.header.Header doesn't fold headers at spaces if value contains ';'s -> email.header.Header doesn't fold headers correctly
messages: + msg133242

components: + Library (Lib), - None
keywords: + patch
2011-04-07 16:56:23r.david.murraysettitle: email.header.Header doesn't fold headers at spaces -> email.header.Header doesn't fold headers at spaces if value contains ';'s
messages: + msg133233
stage: resolved -> needs patch
2011-04-06 21:43:11r.david.murraysetmessages: + msg133170
title: email.header.Header doesn't fold headers -> email.header.Header doesn't fold headers at spaces
2011-04-06 21:21:46kittermasetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg133169
2011-04-06 00:35:12r.david.murraysetstatus: open -> closed
resolution: not a bug
messages: + msg133101

stage: needs patch -> resolved
2011-03-14 22:47:39r.david.murraysetversions: + Python 3.1, Python 3.2, Python 3.3
nosy: barry, r.david.murray, kitterma
messages: + msg130920

assignee: r.david.murray
type: behavior
stage: needs patch
2011-03-14 21:57:19barrysetnosy: + barry, r.david.murray
2011-03-14 07:32:57kittermacreate