classification
Title: Header.decode_header eats up spaces
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.1, Python 3.0, Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: barry Nosy List: alexanderweb, barry, georg.brandl, mgoutell (4)
Priority: high Keywords

Created on 2006-04-10 10:33 by mgoutell, last changed 2009-03-21 03:25 by ajaksu2.

Files
File name Uploaded Description Edit Remove
emailheader.diff georg.brandl, 2007-05-16 12:51
Messages (5)
msg28181 - (view) Author: Mathieu Goutelle (mgoutell) Date: 2006-04-10 10:33
The Header.decode_header function eats up spaces in
non-encoded part of a header.

See the following source:
# -*- coding: iso-8859-1 -*-
from email.Header import Header, decode_header
h = Header('Essai ', None)
h.append('éè', 'iso-8859-1')
print h
print decode_header(h)

This prints:
Essai =?iso-8859-1?q?=E9=E8?=
[('Test', None), ('\xe9\xe8', 'iso-8859-1')]

This should print:
Essai =?iso-8859-1?q?=E9=E8?=
[('Test ', None), ('\xe9\xe8', 'iso-8859-1')]
       ^ This space disappears

This appears in Python 2.3 but the source code of the
function didn't change in 2.4 so the same problem
should still exist. Bug "[ 1372770 ] email.Header
should preserve original FWS" may be linked to that one
although I'm not sure this is exactly the same.

This patch (not extensively tested though) seems to
solve this problem:

--- /usr/lib/python2.3/email/Header.py  2005-09-05
00:20:03.000000000 +0200
+++ Header.py   2006-04-10 12:27:27.000000000 +0200
@@ -90,7 +90,7 @@
             continue
         parts = ecre.split(line)
         while parts:
-            unenc = parts.pop(0).strip()
+            unenc = parts.pop(0).rstrip()
             if unenc:
                 # Should we continue a long line?
                 if decoded and decoded[-1][1] is None:
msg28182 - (view) Author: Alexander Schremmer (alexanderweb) Date: 2006-05-12 22:28
Logged In: YES 
user_id=254738

I can confirm this bug and have been bitten by it as well.
msg28183 - (view) Author: Mathieu Goutelle (mgoutell) Date: 2007-05-16 09:25
Hello,
Any news about this bug. It seems still there in 2.5 after a one year notice...
Regards,
msg28184 - (view) Author: Georg Brandl (georg.brandl) Date: 2007-05-16 12:51
I propose the attached patch. RFC 2047 specifies to ignore whitespace between encoded-words, but IMHO not between ordinary text and encoded-words.
File Added: emailheader.diff
msg28185 - (view) Author: Barry A. Warsaw (barry) Date: 2007-05-16 13:08
IIRC, I tried the OP's patch and it broke too many of the email package's test suite.  I made an attempt at fixing the problem to be much more RFC compliant, but couldn't get the test suite to pass completely.  This points to a much deeper problem with email package header management.  I don't think the problem is a bug, I think it's a design flaw.
History
Date User Action Args
2009-03-21 03:25:08ajaksu2setstage: test needed
type: behavior
versions: + Python 2.6, Python 3.0, Python 3.1, Python 2.7, - Python 2.3
2006-04-10 10:33:54mgoutellcreate