Author rnortman
Recipients
Date 2006-02-28.17:11:45
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The documentation for the email.Generator module claims
that the flatten() method is idempotent (i.e., output
identical to the input, if email.Parser.Parser was used
on the input), but it is not in all cases.  The most
obvious example is that you need to disable mangle_from
and set maxheaderlen=0 to disable header wrapping. 
This could be considered common sense, but the
documentation should mention it, as both are enabled by
default.  (unixfrom can also create differences between
input and output, but is disabled by default.)  More
importantly, whitespace is not preserved in headers: if
there are extra spaces between the header name and the
header contents, it will be collapsed to a single space.

This little snippet will demonstrate the problem:

    parser = email.Parser.Parser()
    msg = parser.parse(sys.stdin)
    print msg
    gen = email.Generator.Generator(sys.stdout,
mangle_from_=False, maxheaderlen=0)
    gen.flatten(msg, unixfrom=False)

Feed it a single message with extra spaces beween field
name and field contents in one or more fields, and diff
the input and the output.

It is probably not worth actually making these routines
idempotent, as preserving whitespace is not important
in most applications and would require extra
bookkeeping.  However, as long as the documentation
claims the routines are idempotent, it is a bug not to
be.  In my particular application, it was important to
be truly idempotent, so this was a problem.  Had the
documentation not made false claims, I would have known
from the start that I needed to write my own versions
of the routines in the email module.
History
Date User Action Args
2007-08-23 14:38:07adminlinkissue1440472 messages
2007-08-23 14:38:07admincreate