This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author runtux
Recipients barry, ggenellina, ishimoto, jafo, kael, leromarinvit, r.david.murray, runtux, tkikuchi, tlynn, tony_nelson
Date 2012-01-03.21:44:12
SpamBayes Score 9.318822e-07
Marked as misclassified No
Message-id <20120103214410.GB32383@runtux.com>
In-reply-to <1325607431.79.0.487950824251.issue1079@psf.upfronthosting.co.za>
Content
Attached please find a patch that
- keeps all spaces between non-encoded and encoded parts
- doesn't create spaces between non-encoded and encoded parts in case
  these are already there or not needed (because they are non-ctext
  characters of RFC822 like ')') in the methods "encode" and "__str__"
  of class Header.
in all other cases spaces are still inserted, this keeps many tests
happy and probably won't break too much existing code.

I've re-read RFC2047 (and parts of 822) and now share your opinion that
it requires that encoded parts *must* be followed by a
'linear-white-space' if the following (or preceding) token is text or ctext.
(p.7 5. Use of encoded-words in message headers)

With the special-casing of ctext characters mentioned above,
roundtripping is now possible, so if you parse a normalized string
consisting of encoded and non-encoded parts, (even multiple) whitespace
is preserved.

I still think we should do it like everyone else and *not* automatically
insert whitespace at boundaries between encoded and non-encoded words,
even if the RFC requires it. Someone wanting to create headers
consisting of mixed encoded/non-encoded parts without whitespace must
know what to do anyway -- the previous implementation also didn't check
for all border cases.

I've *not yet* tested this against the email6 branch you mentioned.

Note that I didn't have to make the regex non-greedy, it already
was. I've just removed the whitespace at the end of the regex.

I've changed all the tests that test for removal of whitespace between
non-encoded and encoded parts. Obviously I've also changed a test that
relied on failing to parse adjacent encoded strings. But please look at
my changes of the tests.

The rfc2047 tests now all pass.

The patch also fixes issue1467619 "Header.decode_header eats up spaces"

Ralf
-- 
Dr. Ralf Schlatterbeck                  Tel:   +43/2243/26465-16
Open Source Consulting                  www:   http://www.runtux.com
Reichergasse 131, A-3411 Weidling       email: office@runtux.com
osAlliance member                       email: rsc@osalliance.com
Files
File name Uploaded
python.patch3 runtux, 2012-01-03.21:44:11
History
Date User Action Args
2012-01-03 21:44:14runtuxsetrecipients: + runtux, barry, jafo, ishimoto, tlynn, ggenellina, tkikuchi, tony_nelson, kael, r.david.murray, leromarinvit
2012-01-03 21:44:13runtuxlinkissue1079 messages
2012-01-03 21:44:12runtuxcreate