This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.Header no encoding of unicode strings containing newlines
Type: behavior Stage: resolved
Components: email Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: barry, flavio, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-10-18 13:08 by flavio, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_email_header_encoding_uses_ascii_before_selected_charset.diff flavio, 2014-10-18 13:08 review
Messages (5)
msg229640 - (view) Author: Flavio Grossi (flavio) Date: 2014-10-18 13:08
When trying to encode an email header with a newline in it, correct encoding is done only for strings and not for unicode strings.
In fact, for unicode strings, encoding is only done if a non ascii character is contained in it.

The attached patch should fix the problem.

Simple example to reproduce the problem:
>>> from email.Header import Header as H

# correctly encoded
>>> H('two\r\nlines', 'utf-8').encode()
'=?utf-8?q?two=0D=0Alines?='

# unicode string not encoded
>>> H(u'two\r\nlines', 'utf-8').encode()
'two\r\nlines'

# unicode string with non ascii chars, correctly encoded
>>> H(u'two\r\nlines and \xe0', 'utf-8').encode()
'=?utf-8?b?dHdvDQpsaW5lcyBhbmQgw6A=?='
msg231714 - (view) Author: Flavio Grossi (flavio) Date: 2014-11-26 14:46
any news?
msg231773 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-27 19:08
I'd have to double check, but I think having /r /n etc encoded in an encopded string is illegal per the rfcs.  It should be, anyway.  So IMO the bug is encoding them at all, but at this point we probably can't fix it for bacward compatibility reasons.

I'm leaving this issue open for the moment because I do want to check the rfc, and also double check what the new API does in this situation (and make sure there are tests).
msg231776 - (view) Author: Flavio Grossi (flavio) Date: 2014-11-27 19:50
Hi, and thank you for your answer.

However this is not strictly related to the newline, but also to some small idiosyncrasies and different behavior among py2 and py3 (and even in py2 using Header() or Charset()):

# py2.7, non-unicode str
>>> H('test', 'utf-8').encode()
'=?utf-8?q?test?='

>>> Charset('utf-8').header_encode('test')
'=?utf-8?q?test?='


# py2.7, unicode str
>>> H(u'test', 'utf-8').encode()   # this is the only different result
'test'

>>> Charset('utf-8').header_encode(u'test')
u'=?utf-8?q?test?='



# py3.4, unicode
>>> H('test', 'utf-8').encode()                                            
'=?utf-8?q?test?='                                                      
                
# py3.4, bytes                                                                
>>> H(b'test', 'utf-8').encode()                                             
'=?utf-8?q?test?='


As you can see, the only when using unicode strings in py2.7 no header encoding is done if the unicode string contains only ascii chars.
msg370474 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-31 14:51
Python 2.7 is no longer supported.
History
Date User Action Args
2022-04-11 14:58:09adminsetgithub: 66856
2020-05-31 14:51:52serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg370474

resolution: out of date
stage: resolved
2014-11-27 19:50:11flaviosetmessages: + msg231776
2014-11-27 19:08:19r.david.murraysetmessages: + msg231773
2014-11-26 14:46:46flaviosetmessages: + msg231714
2014-10-18 13:08:05flaviocreate