classification
Title: email.Header encode() unicode P2.6
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: barry, exarkun, ezio.melotti, r.david.murray, xnovakj
Priority: normal Keywords: patch

Created on 2005-12-13 11:09 by xnovakj, last changed 2010-12-27 19:18 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
header_encode_test.diff r.david.murray, 2010-08-26 16:38
header_charset_fix.diff r.david.murray, 2010-08-26 16:39
Messages (6)
msg27063 - (view) Author: Jan Novak (xnovakj) Date: 2005-12-13 11:09
Python: 2.4
Module: email.Header
Method: encode()
In some cases returns unicode (example on line 5)

1>> from email.Header import Header

2>> Header(unicode('abcá','iso-8859-2'),'utf-8').encode()
'=?utf-8?b?YWJjw6E=?='

3>> Header('abc','utf-8').encode()
'=?utf-8?q?abc?='

4>> Header(u'abc','utf-8').encode()
'abc' ???

5>> Header('abc','iso-8859-2').encode()
u'=?iso-8859-2?q?abc?=' (P2.4)

6>> Header('abc','iso-8859-2').encode()
'=?iso-8859-2?q?abc?=' (P2.3)
msg84003 - (view) Author: Jan Novak (xnovakj) Date: 2009-03-23 10:12
I made some new tests in P2.6.1

>>> import email.charset

>>> c=email.charset.Charset('utf-8')
>>> print c.input_charset, type(c.input_charset)
utf-8 <type 'unicode'>
>>> print c.output_charset, type(c.output_charset)
utf-8 <type 'str'>

but

>>> c=email.charset.Charset('iso-8859-2')
>>> print c.input_charset, type(c.input_charset)
iso-8859-2 <type 'unicode'>
>>> print c.output_charset, type(c.output_charset)
iso-8859-2 <type 'unicode'>

but if you use alias latin-2 it's OK

>>> c=email.charset.Charset('latin-2')
>>> print c.input_charset, type(c.input_charset)
iso-8859-2 <type 'str'>
>>> print c.output_charset, type(c.output_charset)
iso-8859-2 <type 'str'>
>>> 

Error is here for unicode input-charset:
self.input_charset->conv->self.output_charset

module email/charset.py line 219

        if not conv:
            conv = self.input_charset

for the charsets where aren't output conversions

CHARSETS = {
    # input        header enc  body enc output conv
    'iso-8859-1':  (QP,        QP,      None),
    'iso-8859-2':  (QP,        QP,      None),

and if you don't use alias

ALIASES = {
    'latin_1': 'iso-8859-1',
    'latin-1': 'iso-8859-1',
    'latin_2': 'iso-8859-2',
    'latin-2': 'iso-8859-2',

But the realy source of this error is on line 208
 input_charset = unicode(input_charset, 'ascii')

because this construction returns unicode

>>> print type(unicode('iso-8859-2','ascii'))
<type 'unicode'>
msg97359 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-01-07 17:35
Any hope of this being fixed?
msg114997 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-26 16:12
I believe that RDM is working on this sort of issue as part of email6.
msg115009 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-26 16:52
I've attached a fix and test.  I've uploaded them separately since the fix only applies to 2.7, but I want to put the test into 3.x as well.
msg124724 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 19:18
Committed to 2.7 in r87515.  On second thought there's no reason to forward port the test because Python3 doesn't have the equivalent type-promotion issues.
History
Date User Action Args
2010-12-27 19:18:53r.david.murraysetstatus: open -> closed

versions: - Python 3.1, Python 3.2
nosy: - BreamoreBoy

messages: + msg124724
resolution: fixed
stage: patch review -> resolved
2010-12-27 17:04:58r.david.murrayunlinkissue1685453 dependencies
2010-08-26 16:52:37r.david.murraysetstage: test needed -> patch review
2010-08-26 16:52:16r.david.murraysetmessages: + msg115009
2010-08-26 16:39:16r.david.murraysetfiles: + header_charset_fix.diff
2010-08-26 16:38:18r.david.murraysetfiles: + header_encode_test.diff
keywords: + patch
2010-08-26 16:12:56BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114997
versions: + Python 3.1, Python 3.2, - Python 2.6
2010-05-05 13:33:33barrysetassignee: barry -> r.david.murray
2010-04-23 19:15:46ezio.melottisetnosy: + r.david.murray
2010-01-07 17:42:11ezio.melottisetnosy: + ezio.melotti

versions: + Python 2.7
2010-01-07 17:35:27exarkunsetnosy: + exarkun
messages: + msg97359
2009-03-30 22:56:23ajaksu2linkissue1685453 dependencies
2009-03-23 10:12:25xnovakjsetmessages: + msg84003
title: email.Header encode() unicode P2.3xP2.4 -> email.Header encode() unicode P2.6
2009-03-20 23:31:10ajaksu2setstage: test needed
type: behavior
components: + Library (Lib), - None
versions: + Python 2.6
2005-12-13 11:09:30xnovakjcreate