Message 150434 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kxroberto
Recipients	kxroberto
Date	2012-01-01.17:24:58
SpamBayes Score	6.993939e-08
Marked as misclassified	No
Message-id	<1325438700.75.0.319524788564.issue13693@psf.upfronthosting.co.za>
In-reply-to

Content
the email.* package seems to over-encode international charset address fields - resulting even in display errors in the receivers reader - , when message header composition is done as recommended in http://docs.python.org/library/email.header.html Python 2.7.2 >>> e=email.Parser.Parser().parsestr(getcliptext()) >>> e['From'] '=?utf-8?q?Martin_v=2E_L=C3=B6wis?= <report@bugs.python.org>' # note the par >>> email.Header.decode_header(_) [('Martin v. L\xc3\xb6wis', 'utf-8'), ('<report@bugs.python.org>', None)] # unfortunately there is no comfortable function for this: >>> u='Martin v. L\xc3\xb6wis'.decode('utf8') + ' <report@bugs.python.org>' >>> u u'Martin v. L\xf6wis <report@bugs.python.org>' >>> msg=email.Message.Message() >>> msg['From']=u >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n' >>> msg['From']=str(u) >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\n\n' >>> msg['From']=email.Header.Header(u) >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\nFrom: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n' >>> (BTW: strange is that multiple msg['From']=... _assignments_ end up as multiple additions !??? also msg renders 8bit header lines without warning/error or auto-encoding, while it does auto on unicode!??) Whats finally arriving at the receiver is typically like: From: "=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=" <report@bugs.python.org> because the servers seem to want the address open, they extract the address and _add_ it (duplicating) as ASCII. => error I have not found any emails in my archives where address header fields are so over-encoded like python does. Even in non-address fields mostly only those words/groups are encoded which need it. I assume the sophisticated/high-level looking email.* package doesn't expect that the user fiddles things together low-level? with parseaddr, re.search, make_header Header.encode , '.join ... Or is it indeed (undocumented) so? IMHO it should be auto-smart enough. Note: there is a old deprecated function mimify.mime_encode_header which seemed to try to cautiously auto-encode correct/sparsely (but actually fails too on all examples tried).

the email.* package seems to over-encode international charset address fields - resulting even in display errors in the receivers reader - , 
when message header composition is done as recommended in http://docs.python.org/library/email.header.html 

Python 2.7.2
>>> e=email.Parser.Parser().parsestr(getcliptext())
>>> e['From']
'=?utf-8?q?Martin_v=2E_L=C3=B6wis?= <report@bugs.python.org>'
# note the par
>>> email.Header.decode_header(_)
[('Martin v. L\xc3\xb6wis', 'utf-8'), ('<report@bugs.python.org>', None)]
# unfortunately there is no comfortable function for this:
>>> u='Martin v. L\xc3\xb6wis'.decode('utf8') + ' <report@bugs.python.org>'
>>> u
u'Martin v. L\xf6wis <report@bugs.python.org>'
>>> msg=email.Message.Message()
>>> msg['From']=u
>>> msg.as_string()
'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n'
>>> msg['From']=str(u)
>>> msg.as_string()
'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\n\n'
>>> msg['From']=email.Header.Header(u)
>>> msg.as_string()
'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\nFrom: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n'
>>> 

(BTW: strange is that multiple msg['From']=... _assignments_ end up as multiple additions !???   also msg renders 8bit header lines without warning/error or auto-encoding, while it does auto on unicode!??)

Whats finally arriving at the receiver is typically like:

From: "=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=" <report@bugs.python.org>

because the servers seem to want the address open, they extract the address and _add_ it (duplicating) as ASCII. => error

I have not found any emails in my archives where address header fields are so over-encoded like python does. Even in non-address fields mostly only those words/groups are encoded which need it.

I assume the sophisticated/high-level looking email.* package doesn't expect that the user fiddles things together low-level? with parseaddr, re.search, make_header Header.encode , '.join ... Or is it indeed (undocumented) so? IMHO it should be auto-smart enough.

Note: there is a old deprecated function mimify.mime_encode_header which seemed to try to cautiously auto-encode correct/sparsely (but actually fails too on all examples tried).

History
Date	User	Action	Args
2012-01-01 17:25:00	kxroberto	set	recipients: + kxroberto
2012-01-01 17:25:00	kxroberto	set	messageid: <1325438700.75.0.319524788564.issue13693@psf.upfronthosting.co.za>
2012-01-01 17:25:00	kxroberto	link	issue13693 messages
2012-01-01 17:24:58	kxroberto	create