classification
Title: email.Header should preserve original FWS
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: email.header.Header doesn't fold headers correctly
View: 11492
Assigned To: r.david.murray Nosy List: ajaksu2, barry, nherring, r.david.murray, srikanths
Priority: normal Keywords: easy

Created on 2005-12-04 10:53 by nherring, last changed 2012-01-03 07:32 by srikanths. This issue is now closed.

Messages (6)
msg26980 - (view) Author: Nathan Herring (nherring) Date: 2005-12-04 10:53
The Header class describes its operation as using the continuation_ws 
parameter, prepending the value to continuation lines. This has the 
byproduct of possibly converting pre-existing FWS in a header, as 
evidenced by mailman 2.1.6 which exposes the problem.

If the Header class is passed pre-existing Header lines, which in the 
mailman case is from the original message, and, without any 
manipulation, ask it for the encoded version, it replaces the original 
folding with the continuation_ws characters.

Given that RFC 2822 unfolding consists only of removing CRLFs, 
exchanging out the FWS characters changes the logical content of a 
header value. Standard folding of us-ascii text should only consist of 
introducing line breaks in front of original FWS in the header line. In 
the case where the encoding of the source string requires multiple 
adjacent RFC 2047 encoded-words (either due to disparate encodings 
or text length), then FWS can be freely inserted and is treated as an 
artifact of the encoding. It is ignored on reading and as such it doesn't 
affect the logical content of the header value. It's in this latter case that 
the continuation_ws parameter should be used.

e.g., 

#!/usr/bin/python -d
from email.Header import Header
s = "Thread-Topic: Use of tabs when folding header lines -- increasing 
subject\n length as a test\n"
print Header(s, 'us-ascii', None, None, '\t')

This script will have replaced the space in front of the word "length" 
with a tab. It should retain that space and not convert it to the 
continuation_ws character.
msg113045 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 20:38
The email module has several problems. RDM is working on overhauling the email module for 3.2. Existing issues may not get individual attention.
msg113101 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-06 14:06
Header parsing and formatting is one of the major changes in email6, and it should handle this much more sensibly.  email6 won't land until 3.3, though it will be available on pypy for testing before that.  If you want propose a patch for fixing this in the current email module I will consider it (but see below).  Absent that, I'm marking this for email6/3.3.

Note that in email6 the primary interface for creating headers will be a factory function, and in the current design it will reject header values that contain /n (and/or /r).  The parser will deal with unfolding values when parsing existing messages.

Also, while I agree with you about what the RFC *says*, what email programs actually *do* seems to be a bit different.  Email generators in general use a single leading tab as folding whitespace, but if you unfold the resulting value, it is clear that the tabs are noise, and should be replaced by single spaces on unfolding.  This becomes obvious when you consider things like an unstructured Subject header that has been wrapped.  It will be wrapped with tabs by any mailer I've so far encountered, but if you unwrap it, add a 'Re:', and rewrap it, preserving the tab is clearly the wrong thing to do.  This is in fact what email used to do, and this has annoyed many many people over the years (including me) because the header in the reply message has this tab stuck in the middle of the subject...

So currently my plan is to special case tabs on folding and unfolding.  When unfolding single leading tabs will become blanks, when folding a single tab will be used as folding white space, replacing single blanks at the point of folding.  I haven't tested this algorithm on any other mailers yet, because I haven't got enough of the code finished yet to generate parseable messages.  (Maybe I'll do some by hand.)

This folding policy will be a controllable policy setting, so it will be possible to produce the strictly-RFC-conformant folding and unfolding on a per-message or even (when creating them) a per-header basis.

I welcome your thoughts on this subject (and if you are so moved your participation on the email-sig, which while it is pretty quiet right now will probably get less so soon when I post the next API iteration for email6).
msg125735 - (view) Author: Nathan Herring (nherring) Date: 2011-01-08 00:17
You are certainly correct about (some, perhaps many) e-mail generators using tabs when folding, which is AFAICT, much more likely an incorrect implementation of RFC 2822 rather than an intentional transformation of the user's specified Subject line. Some*, however, dutifully unfold those tabs back into the Subject line, causing all sorts of strangeness -- explosion of conversations that are logically identical, except that in one, it's all spaces, and in the others, some of the spaces have, seemingly arbitrarily, turned into tabs. 

Fixing all of the e-mail generators to Do The Right Thing wasn't what I had in mind, but making sure that mailman, a rather commonly used mail service, and its reliance on python's Header class, would no longer permute the messages made my e-mail generators that do Do The Right Thing, was one step.

I would be inclined to, were I a contributor, make it so that it would conform to RFC 2822 unfolding and preserve the FWS accordingly. I don't know if there'd be enough desire for your alternative, but if there were, I'd make it a non-default option to de-tabify Subject lines (or other headers).

As it stands, I am newly freed from any restrictions from contributing, and so I might try and see what mailman's intentions are surrounding taking updates to python before trying to propose a patch that they might never use (i.e., they may just take the version you're working on now.)

*Microsoft Entourage (which I worked on while at Microsoft) and Microsoft Outlook both behave correctly, as far as I could determine in '05, with regards to the folding/unfolding headers.
msg125784 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-08 15:42
I agree that when dealing with prexisting folding it is better to preserve it.  The case I was talking about is, say, prepending re to a subject and refolding it.  It is the transformation step where I think turning fws into a single space makes sense.  But I absolutly want this to be controllable and hope that experience can inform the chosen defaults.  I also expect to be getting feedback from the mailman folks well before anything gets set in stone.
msg133974 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-04-18 15:05
As of the fix for issue 11492, the email package only uses continuation_ws when folding RFC2047 encoded words.  So I consider this issue fixed.  (I have, by the way, come around to the view that we should never be introducing or deleting whitespace except when RFC 2047 encoding/decoding...we are still deleting it in a couple places, and I will address that by and by).
History
Date User Action Args
2012-01-03 07:32:06srikanthssetnosy: + srikanths
2011-04-18 15:05:41r.david.murraysetstatus: open -> closed
superseder: email.header.Header doesn't fold headers correctly
messages: + msg133974

components: + Library (Lib), - Extension Modules
resolution: duplicate
stage: test needed -> resolved
2011-01-08 15:42:34r.david.murraysetnosy: barry, nherring, ajaksu2, r.david.murray
messages: + msg125784
2011-01-08 01:17:31terry.reedysetnosy: - terry.reedy
2011-01-08 00:17:47nherringsetnosy: barry, terry.reedy, nherring, ajaksu2, r.david.murray
messages: + msg125735
2010-08-06 14:06:44r.david.murraysetmessages: + msg113101
versions: + Python 3.3, - Python 3.2
2010-08-05 20:38:00terry.reedysetnosy: + terry.reedy

messages: + msg113045
versions: + Python 3.2, - Python 2.6, Python 3.0
2010-05-05 13:45:25barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-04-22 14:36:11ajaksu2setkeywords: + easy
2009-03-20 23:31:15ajaksu2setnosy: + ajaksu2
versions: + Python 2.6, Python 3.0

type: behavior
stage: test needed
2005-12-04 10:53:10nherringcreate