Issue 13693: email.Header.Header incorrect/non-smart on international charset address fields

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57902

classification

Title:	email.Header.Header incorrect/non-smart on international charset address fields
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	kxroberto, r.david.murray
Priority:	normal	Keywords:

Created on 2012-01-01 17:25 by kxroberto, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (3)
msg150434 - (view)	Author: kxroberto (kxroberto)	Date: 2012-01-01 17:24
the email.* package seems to over-encode international charset address fields - resulting even in display errors in the receivers reader - , when message header composition is done as recommended in http://docs.python.org/library/email.header.html Python 2.7.2 >>> e=email.Parser.Parser().parsestr(getcliptext()) >>> e['From'] '=?utf-8?q?Martin_v=2E_L=C3=B6wis?= <report@bugs.python.org>' # note the par >>> email.Header.decode_header(_) [('Martin v. L\xc3\xb6wis', 'utf-8'), ('<report@bugs.python.org>', None)] # unfortunately there is no comfortable function for this: >>> u='Martin v. L\xc3\xb6wis'.decode('utf8') + ' <report@bugs.python.org>' >>> u u'Martin v. L\xf6wis <report@bugs.python.org>' >>> msg=email.Message.Message() >>> msg['From']=u >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n' >>> msg['From']=str(u) >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\n\n' >>> msg['From']=email.Header.Header(u) >>> msg.as_string() 'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: Martin v. L\xf6wis <report@bugs.python.org>\nFrom: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n' >>> (BTW: strange is that multiple msg['From']=... _assignments_ end up as multiple additions !??? also msg renders 8bit header lines without warning/error or auto-encoding, while it does auto on unicode!??) Whats finally arriving at the receiver is typically like: From: "=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=" <report@bugs.python.org> because the servers seem to want the address open, they extract the address and _add_ it (duplicating) as ASCII. => error I have not found any emails in my archives where address header fields are so over-encoded like python does. Even in non-address fields mostly only those words/groups are encoded which need it. I assume the sophisticated/high-level looking email.* package doesn't expect that the user fiddles things together low-level? with parseaddr, re.search, make_header Header.encode , '.join ... Or is it indeed (undocumented) so? IMHO it should be auto-smart enough. Note: there is a old deprecated function mimify.mime_encode_header which seemed to try to cautiously auto-encode correct/sparsely (but actually fails too on all examples tried).
msg150440 - (view)	Author: kxroberto (kxroberto)	Date: 2012-01-01 18:57
now I tried to render this address field header u'Name <abc\u03a3@xy>, abc@ewf, "Nameß" <weofij@fjeio>' with h = email.Header.Header(continuation_ws='') h.append ... / email.Header.make_header via these chunks: [('Name <', us-ascii), ('abc\xce\xa3', utf-8), ('@xy>, abc@ewf, "', us-ascii), ('Name\xc3\x9f', utf-8), ('" <weofij@fjeio>', us-ascii)] the outcome is: 'Name < =?utf-8?b?YWJjzqM=?= @xy>, abc@ewf, " =?utf-8?b?TmFtZcOf?=\n " <weofij@fjeio>' (note: local part of email address can be utf too) It seems to be impossible to avoid the erronous extra spaces from outside within that email.Header framework. Thus I guess it was not possible up to now to decently format a beyond-ascii MIME message using the official email.Header mechanism? - even when pre-digesting things
msg150468 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-01-02 17:43
Actually, no, the local part cannot be in anything other than ascii (see RFC 5335, which desires to address this problem among others). Also, an encoded word cannot occur inside quotation marks. If you correct those two bugs, you can generate an RFC-valid address using Header.append. There is a project underway to make all of this header parsing and formatting stuff work better: see the http://pypi.python.org/pypi/email. By the way, this is easier already in python 3.2. There you can do: >>> formataddr(('Nameß', 'weofij@fjeio')) '=?utf-8?b?TmFtZcOf?= <weofij@fjeio>'

History
Date	User	Action	Args
2022-04-11 14:57:25	admin	set	github: 57902
2012-01-02 17:43:26	r.david.murray	set	status: open -> closed nosy: + r.david.murray messages: + msg150468 resolution: not a bug
2012-01-01 18:57:07	kxroberto	set	messages: + msg150440
2012-01-01 17:25:00	kxroberto	create