msg130793 - (view) |
Author: Scott Kitterman (kitterma) |
Date: 2011-03-14 07:32 |
Header folding is very different (non-existent as far as I've found so far) in Python3. Here's a short example:
#!/usr/bin/python
# -*- coding: ISO-8859-1
from email.header import Header
hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)'
print(Header(hdrin))
With python2.6 the output is:
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
[72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
<bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)
With python3.1 or 3.2 the output is one line:
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)
This makes it very difficult to write header processing code that works for both Python2 and Python3 even if one can fold headers at all in Python3.
|
msg130920 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-03-14 22:47 |
It exists, but clearly it is broken. I'll look in to it.
|
msg133101 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-06 00:35 |
Ah, it isn't broken, it's just that the default changed. In 2.x, the default was maxlinelen=78, in 3.x, the default is maxlinelen=None (unlimited), but generator passes in an override of 78 when formatting output. So you can specify an explicit maxlinelen=78 and that will wrap the headers in both 2.x and 3.x. (There are differences in the wrapping algorithms, though!)
|
msg133169 - (view) |
Author: Scott Kitterman (kitterma) |
Date: 2011-04-06 21:21 |
Not so fast ... I may have done this wrong, but I get:
print(Header(hdrin,maxlinelen=78))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)
all in one line with python3.2, so maxlinelen doesn't appear to do anything. With python2.7 it seems to when invoked that way:
Python 2.7.1+ (r271:86832, Mar 24 2011, 00:39:14)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.header import Header
>>> hdrin = 'Received: from mailout00.controlledmail.com (mailout00.controlledmail.com [72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for <bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)'
>>> print(Header(hdrin))
Received: from mailout00.controlledmail.com (mailout00.controlledmail.com
[72.81.252.19]) by mailwash7.pair.com (Postfix) with ESMTP id 16BB5BAD5 for
<bcc@kitterman.com>; Sun, 13 Mar 2011 23:46:05 -0400 (EDT)
>>> print(Header(hdrin, maxlinelen=30))
Received: from
mailout00.controlledmail.com
(mailout00.controlledmail.com
[72.81.252.19]) by
mailwash7.pair.com (Postfix)
with ESMTP id 16BB5BAD5 for
<bcc@kitterman.com>;
Sun, 13 Mar 2011 23:46:05
-0400 (EDT)
|
msg133170 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-06 21:43 |
You have to do an 'encode' to get the wrapped header. __str__ uses maxlinelen=None.
However, there does seem to be a problem with the line wrapping algorithm revealed by your example: it is only doing a line break at the ';', not at any spaces. I will look in to this.
|
msg133233 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-07 16:56 |
OK, it looks like the wrapping problem arises when the line contains runs of blank delimited tokens longer than maxlinelen *and* the line also contains ';'s. The line is then split at the ';' and the remaining overlong pieces are not split.
I'll work on a fix.
|
msg133242 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-07 17:51 |
Here is a patch containing three test cases that demonstrate three different failings of the header folding algorithm. I'm working on the fix, but it is non-trivial.
|
msg133243 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-07 17:57 |
Note that 2.7 fails two of these tests as well, but for different reasons. I'm not currently planning to fix 2.7, as its behavior at least (a) doesn't lose non-whitespace information and (b) doesn't exceed the maxheaderlen.
|
msg133266 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-07 22:09 |
Here is an updated test patch that brings the test coverage of the relevant code much closer to 100%. There are still three lines and one branch uncovered, but it appears as though one of the bugs is preventing the test case that would produce full coverage from getting to the relevant code path. This gives me enough coverage to feel safer mucking about with the algorithm.
|
msg133283 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-04-08 01:01 |
New changeset 10725fc76e11 by R David Murray in branch '3.1':
#11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/10725fc76e11
New changeset 74ec64dc3538 by R David Murray in branch '3.2':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/74ec64dc3538
New changeset 5ec2695c9c15 by R David Murray in branch 'default':
Merge #11492: fix header truncation on folding when there are runs of split chars.
http://hg.python.org/cpython/rev/5ec2695c9c15
|
msg133477 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-10 20:12 |
This was quite the adventure. The more I worked on fixing the tests, the more if/else cases the existing splitting algorithm grew. When I reached the point where fixing one test broke two others, I thought maybe it was time to try a different approach.
Based on the knowledge gathered by banging my head on the old algorithm, I developed a new one. This one is more RFC2822/RFC5322 compliant, I believe. It breaks only at FWS, but still gives preference to breaking after commas or semicolons by default.
I had to adjust several tests that tested broken behavior: the "folded" lines were longer than maxlen even though there were suitable fold points.
I'm very happy with this patch because there are 70 fewer lines of code but the module passes more tests.
Even though the code changes are extensive, I plan to apply this to 3.2. It fixes bugs, and the new code is at least somewhat easier to understand than the old code (if only because there is less of it!) I don't plan to apply it to 3.1 because one older test fails if the patch is applied and I don't understand why (it appears to have nothing to do with line wrapping, and the same test works fine in 3.2).
|
msg133480 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-04-10 20:30 |
Note that this fix solves issue 11772, so I've closed that one as a duplicate.
|
msg133969 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-04-18 14:12 |
New changeset 51a551acb60d by R David Murray in branch '3.2':
#11492: rewrite header folding algorithm. Less code, more passing tests.
http://hg.python.org/cpython/rev/51a551acb60d
New changeset fcd20a565b95 by R David Murray in branch 'default':
Merge: #11492: rewrite header folding algorithm. Less code, more passing tests.
http://hg.python.org/cpython/rev/fcd20a565b95
|
msg329426 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-11-07 16:56 |
I wrote PR 10378 to show that I don't think that this bug must be fixed in Python 2: it would break any application relying on the current folding algorithm.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:14 | admin | set | github: 55701 |
2018-11-07 16:56:13 | vstinner | set | nosy:
+ vstinner messages:
+ msg329426
|
2018-11-07 16:50:52 | vstinner | set | pull_requests:
+ pull_request9680 |
2012-01-03 07:37:54 | srikanths | set | nosy:
+ srikanths
|
2011-04-18 15:11:16 | r.david.murray | link | issue5612 superseder |
2011-04-18 15:05:41 | r.david.murray | link | issue1372770 superseder |
2011-04-18 14:45:33 | r.david.murray | link | issue8769 superseder |
2011-04-18 14:27:32 | r.david.murray | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2011-04-18 14:12:29 | python-dev | set | messages:
+ msg133969 |
2011-04-10 20:30:53 | r.david.murray | set | messages:
+ msg133480 |
2011-04-10 20:12:32 | r.david.murray | set | files:
+ better_header_spliter.patch
stage: needs patch -> patch review messages:
+ msg133477 versions:
- Python 3.1 |
2011-04-10 16:59:34 | r.david.murray | link | issue11772 superseder |
2011-04-08 01:01:28 | python-dev | set | nosy:
+ python-dev messages:
+ msg133283
|
2011-04-07 22:09:38 | r.david.murray | set | files:
+ header_folding_tests.patch
messages:
+ msg133266 |
2011-04-07 22:09:02 | r.david.murray | set | files:
- header_folding_tests.patch |
2011-04-07 17:57:03 | r.david.murray | set | files:
+ header_folding_tests.patch
messages:
+ msg133243 |
2011-04-07 17:53:40 | r.david.murray | set | files:
- header_folding_tests.patch |
2011-04-07 17:51:17 | r.david.murray | set | files:
+ header_folding_tests.patch title: email.header.Header doesn't fold headers at spaces if value contains ';'s -> email.header.Header doesn't fold headers correctly messages:
+ msg133242
components:
+ Library (Lib), - None keywords:
+ patch |
2011-04-07 16:56:23 | r.david.murray | set | title: email.header.Header doesn't fold headers at spaces -> email.header.Header doesn't fold headers at spaces if value contains ';'s messages:
+ msg133233 stage: resolved -> needs patch |
2011-04-06 21:43:11 | r.david.murray | set | messages:
+ msg133170 title: email.header.Header doesn't fold headers -> email.header.Header doesn't fold headers at spaces |
2011-04-06 21:21:46 | kitterma | set | status: closed -> open resolution: not a bug -> (no value) messages:
+ msg133169
|
2011-04-06 00:35:12 | r.david.murray | set | status: open -> closed resolution: not a bug messages:
+ msg133101
stage: needs patch -> resolved |
2011-03-14 22:47:39 | r.david.murray | set | versions:
+ Python 3.1, Python 3.2, Python 3.3 nosy:
barry, r.david.murray, kitterma messages:
+ msg130920
assignee: r.david.murray type: behavior stage: needs patch |
2011-03-14 21:57:19 | barry | set | nosy:
+ barry, r.david.murray
|
2011-03-14 07:32:57 | kitterma | create | |